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Introduction 


The differential equations which model the action of selection 
and recombination are nonlinear equations which are impossible to 
solve explicitly. It is even difficult to describe in general the 
qualitative behavior of solutions. Recently, Shahshahani began using 
differential geometry to study these equations [28]. With this mono¬ 
graph I hope to show that his ideas illuminate many aspects of pop¬ 
ulation genetics. Among these are his proof and clarification of 
Fisher's Fundamental Theorem of Natural Selection and Kimura's 
Maximum Principle and also the effect of recombination on entropy. 

We also discover the relationship between two classic measures of 

2 

genetic distance: the x measure and the arc-cosine measure. 

There are two large applications. The first is a precise 
definition of the biological concept of degree of epistasis which 
applies to general (i.e. frequency dependent) forms of selection. 

The second is the unexpected appearance of cycling. We show that 
cycles can occur in the two-locus-two-allele model of selection plus 
recombination even when the fitness numbers are constant (i.e. no 
frequency dependence). 

This work is addressed to two different kinds of readers which 
accounts for its mode of organization. 

For the biologist. Chapter I contains a description of the 
entire work with brief indications of a proof for the harder results. 
I imagine a reader with some familiarity with linear algebra and 
systems of differential equations. Ideal background is Hirsch and 
Smale's text [15]. In Section 3 we introduce what manifold theory is 
necessary together with a review of the underlying linear algebra and 
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calculus. 

The remaining Chapters are more demanding though the epistasis 
examples and discussion of position effects in Chapter III are worth 
a look. 

For the mathematician, the technical Chapters II and IV are 
the heart of the work with Chapter I serving as an introduction and 
biological orientation. However, some acquaintance with the rudi¬ 
ments of genetics is needed. I recommend "An Introduction to 
Genetics" by Sturtevant and Beadle (Dover—1962). This is a reprint 
of a book published in 1939 and so is uncluttered by the fallout 
of the recent explosive growth of the field. 

Here I would like to thank Ms. Kate March for her typing of 
the manuscript (twice) and the NSF for their support of this work. 



We consider a large population of diploid organisms among whose 


gametes we distinguish n different types, indexed by a set I. So 
we describe a member of the population by telling its genotype, a pair 
ij (= ji) with i and j elements of I. We can describe the pop¬ 
ulation by telling the frequencies of the different genotypes, x_ = 
the number of organisms with genotype ij. The information in this 
frequency table is equivalently described by the total population 

number Ex. . and the distribution of diploid types fp. .) where p. . is 
13 ij 

the fraction of the total population having genotype ij. The diploid 
zygotes which make up the population are obtained by the pairing of 
haploid gametes. We will assume that this pairing is random in the 
Hardy-Weinberg sense. This means that the two gametes in the zygote 
are independent of one another. It is then sufficient to know the 
distribution of the haploid gamete types, [p^), and their total number, 
which we will denote by |x|, because p„ = 2p^p^ (i ^ j), p^ = p^ and 
E x__ = |x|/2. If we let R 1 denote the n-dimensional vector space of 
real valued functions on I then the gamete distribution is a vector 
p in the simplex A = {peR I :p. J>0 and E p. =1) 


The genes of the gametes occur on the chromosomes. At each of 


l different positions, or loci, on the chromosomes are the genes 

which in the zygote will determine its biological characteristics. 

For the a position (a = 1,... i) the n^ different possible genes 

which can occur constitute a finite set I . Thus, I is the set of 

a a 

alleles at the a locus. A haploid genotype i is a list of j t 
choices i e 1^ for a = 1,...,£. So the set I of genotypes is the 

j 1 

Cartesian product I = n ^1 . The number of genotypes n is the 
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product , n . 

^ a=l a 

Now let F^ denote the space F a and A a be the corresponding 

simplex. The gamete distribution p e Aj, is a probability distribution 

a 

on the product I. It induces a distribution p e namely the 

OL 

marginal distribution on the factor 1^. If k € I Q then p^ (also 

written p a (k)) is the probability that a random gamete has gene k at 

the a locus. The map E a (p) = p a from A to is the restriction 

a I 

of the linear mapping E : F -» F^ defined by 


(0.1) E a (x)(k) 


(x(i): for all i with = k} 


(k 6 I ) . 
a 


Note that we use x^ and x(i) interchangeably for notational Conven¬ 
ience. 

This just means that the probability that k occurs at the a 
locus is the sum of probabilities p^ where the sum is taken over all 
genotypes with i = k. 

More generally, if S is any subset of the set of loci 

L = (1, . . . , Jl ), let I be the product of the factors 1^ for a in S. 

So I = II I is the collection of partial genotypes obtained by 
S Ct€S Cfc 

ignoring all but the loci in S. For i e I let i denote the projec- 

tion of i to I . So (i = i for all a e S. Define F„ = F and 
S Sa a S 

A to be the corresponding simplex. p induces a distribution 
b 

s s s 

p = E (p) on the subproduct I . E : A A is the restriction of 

S S 

SI s 

the linear map E : F -> F defined by: 

(0.2) E S (x)(k) = \ (x(i): for alliwithi = k) (kel). 

/ S S 

s 

So p (k) is the probability that the allele k^ occurs at locus a for 
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all of the loci a in S. 

If T is another subset of L, disjoint from S, then for 
i^ € 1^ and j € I we denote by i j_ the element of I iim whose value 
at locus a agrees with (ig) a for a e S and with (j T ) a for a e T. In 
particular, if we denote by S the complement of the set S in L 
then, for i € I, i = i i~. 

iD O 

These notations are just bookkeeping devices to keep from writ¬ 
ing genotypes and partial genotypes as lists of genes. We turn now to 
the substance of the model. 

1. The Equations of Selection, Recombination and Mutation . 

In the vectorfield or differential equation model of population 
genetics, evolution is regarded as due to the sum of the effects of 
selection, recombination and mutation. Assuming the Hardy-Weinberg 
condition, we represent each of these by a vectorfield on the space 
of gametic genotype distributions, A- 

We have assumed that the diploid genotype of a member of the 
population determines it biological characteristics, among these are 
two rates: a reproductive,rate and a death rate. Each zygote of type 
ij is assumed to have an average of b^^dt offspring in a time interval 
of length dt and to have probability d^^dt of dying in the same time 
interval. By an offspring of a zygote we mean two gametes given to 
newborns which are zygotes receiving complementary gametes from other 
members of the population. On average the two gametes contributed 
will be an i and a j. Since we are only counting gametes we can 
think of an offspring of an ij zygote as a gain of an ij zygote. 

The gain or loss of an ij zygote causes the gain or loss of one 
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(if i ^ j) or two (if i =* j) i gametes. Thus, if we define fitness 

m. . = b. . - d. . the change in the number of i gametes in time dt is 
ID ID ID 

given by: 


dx. = (2m..x.. + > m..x..)dt = x.m.dt, 

1 li li / 13 ij li 


Here we define m^ = £ m^p^ and get the last equation from the Hardy- 

Weinberg assumption in the form x^ = 2p^p^•(|x|/2) = x^p^(i ^ j) and 
2 

2x^ = 2p^(|x|/2) = x^p^, where x^ = p^* (|x|) is the number of gametes 
of type i. 

So we get the first selection equation: 


~7T = x.m. . 
dt li 


Recall that |x| = E x. is the total number of gametes. So: 


■ 3 . 1*1 = 


Since p^ = x^/|x| the quotient rule implies that: 


_JL^i d|x| . 

dt I x I ( dt P i dt 


Applying this to (1.1) and (1.2) we get: 


— = p. (m. - m) 


Here m = S.p.m. = I. .p.p.m.. is mean fitness. Note that we write b, 
i i i 1 J 1 D 1 D !D 

m^j, etc. as functions of unordered pairs or, as in the latter equa¬ 
tion, as symmetric functions of ordered pairs. 


Recall that the offspring of an ij zygote consisted of i and 
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j gametes. This assumes that there is no recombination. The recom¬ 
bination term in the equation is the correction which must be included 
if there is. 

Let S be a subset of L = {l,...,i) the set of loci. With 
5 

probability r an ij zygote will suffer a series of crossovers so 
that i and j will exchange genetic material exactly in the loci 
of S, or equivalently, exactly in the loci of the complement, 

S = L - S. The offspring will then consists of i = i g j~ and j = 
gametes, where i j~ is the element of I agreeing with i at the 

u O 

loci of S and with j at the loci of S. The recombination proba¬ 
bilities themselves can be under genetic control in which case we 

5 

write r^j for the probability of an S-exchange in a parent of type ij. 

S S ~ 

r and r^ really depend only on the pair (S,S} and so we will assume 

S S 

r.. = r.. = one half of the actual recombination probability. 

ID iD 

In the most important example the loci are arranged in order 
on a single chromosome. When a single crossover between the jj, and 
Ijl + 1 loci (1 <1 m- < 4) occurs then S = S = {aeL:a<^p,}. 

We saw above that b^dt times |x|p^p_. gametes of type i are 
contributed to the gene pool as offspring of the ij zygotes in a time 

g 

interval of length dt. Of these the fraction r^ . are lost by S-recom- 

g 

bination. On the other hand, rr^b-TT | x| prpT-dt gametes of type i are 
contributed by S-recombination of the ij zygotes. So the term which 
must be added to equation (1.1) to correct for recombination is: 


(1.5) 


dx. 

^ dt ^ R 


I 

j,s 


s _ 

r..b..p.p. 
13 id I D 


- r T -b-r-pTPT. 
ID ID i D 


If we sum these terms on i we get zero, meaning that the effect of 
the correction on the gamete population growth rate, (d|x|/dt) R , is 
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zero. So the correction term for dp^/dt is given by (see (1.3)): 


dp i. V" s v s . 

■^) R = ^2_ r T^TTPTPT 


( 1 . 6 ) 


13 ij i ] ID ID i D 


j,S 


The form of the recombination term is simpler if we assume that 

S S S 

r. . and b. . are completely symmetric meaninq r. . = r-- and b, . = h-- 

ID ID - - - ID ID iD iD 

g 

for all i.j and S. That r. b,. and d.. are symmetric in i and 

iD' iD iD 

j, eg. b„ = b_.^, is just a result of thinking of the genotype of the 
zygote as an unordered pair of gametes. The complete symmetry assump¬ 
tion means that the phenotypic characteristics of the zygote, namely 

g 

bf j, d_^j and the r^ 's depend only on the genes and not on how they 
are associated on the chromosomes. For example, in the two locus, two 
allele case this means that the "coupling" and "repulsion" hetero¬ 
zygotes have the same phenotype. The failure of complete symmetry is 
one form of what geneticists refer to as position effects . 

If complete symmetry holds then we can rewrite equation (1.6): 


(1.7) 


dp X-- 

Wr'-Z. 


r. .b. . (p.p. - Prp-:) 
iD ID I D i D 


j,S 


Be careful here of useful but misleading notation. i and j each 
depend on i,j and S. 

The final member of our trinity is the correction due to muta¬ 
tion. We take the equation straight from Wright [35, p. 369], 

Let n.. be the relative rate by which i gametes are trans- 
i] 

formed to j gametes by mutation when i ^ j. Define n_^* = E n__, 
summing on all j ^ i. The correction for (1.1) due to mutation is 
denoted (dx^/dt)^. It is given by: 
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4 i. V - 

r> N =2_ X j n ji " X i n i* 
j/i 


Z p.n.. - p.n 

] ]i i 1* 

j^i 


This says that the net rate of change of x^ is the difference between 
the absolute rates at which i gametes are produced and lost. Again 
the sum on i is zero and so (d|x|/dt) N equals zero. So: 




Z p .n . . - p. n. ^ 
3 3i P i i" 


If we assume that mutations occur independently at the separate loci 

then the n..'s have a special form which we will look at later. 

13 

These equations are all in the text books of population gene¬ 
tics, eg. Crow and Kimura [6], although the notation which makes 
recombination tractable for multilocus models is essentially due to 
Shahshahani [28].* 

I won't say much about the biological simplification built into 
the model. For example, the assumption that the phenotype is deter¬ 
mined by the genotype means that we ignore or average out environ¬ 
mental effects. Also the model has no age structure as we lump all 
the zygotes together and don't include any lag time for development. 
These matters are better described by biologists. Jacquard [17], for 
example, has a particularly careful discussion of the role of random 
mating and large population size in such models. However, there are 
two points of interest which are really in the mathematical domain. 

Postulating the Hardy-Weinberg condition is a mathematically 
odd way to proceed. What one ought to do is start with a model for 
zygotic frequencies and then prove that the Hardy-Weinberg condition 
follows. That is, show that under certain conditions every solution 
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of the zygotic differential equation tends toward the region (the 
submanifold, actually) where the Hardy-Weinberg condition holds, or 
at least that any solution which begins in the Hardy-Weinberg region 
remains there. I didn't do it because it doesn't work. Hoppensteadt 
has looked at such a model [16, Sec. II.2], Only if the death rates 
d^j are constant (i.e. independent of the genotype ij) is the Hardy- 
Weinberg set preserved. He shows, however, that if the d^^'s are 
nearly constant then there is an invariant submanifold close to the 
Hardy-Weinberg submanifold. This is one reason, among several, that 
the model is limited to the case of "slow selection". 

The other point has to do with the number of loci to which the 
model is applicable. One of the central ideas of this paper is that 
the introduction by Shahshahani of differential geometric methods to 
the study of these classical equations should allow us to get beyond 
the small models of the two locus two allele case in studying the 
interaction between selection and recombination. But the vectorfield 
model is still only a medium-sized model . While it is designed to 
get beyond the two-locus models there is still a certain size limita¬ 
tion. Once the number of genotypes n gets to be the order of magni¬ 
tude of the population size or greater, it no longer makes sense to 
think of the gene pool as a continuous flow of genotype frequencies 
because each genotype will appear in the pool only a small whole num¬ 
ber of times. This is the truism of genetic uniqueness. If there 

£ 

are n alleles at every locus then n = (n ) . So we must really 

a a 

assume 


(1.10) 


S, to < An|x| . 
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Since An 20 < 3, if we are dealing with 20 alleles per locus and a 
population of 1,000,000 or so then i can't be much bigger than 3 or 
4. If there are only 2 alleles per locus then the model is reasonable 
for 15 or 16 loci. In any case the vectorfield model can only deal 
with a tiny number of loci compared to the actual genome of most 
species. 


2. Multivariate Analysis and Types of Epistasis . 


Consider a metric character t . or €.. which we think of as a 

i il 

real-valued function of the gametic or zygotic genotype. In the realm 
of genetic statistics we fix the gamete probability distribution p^ 
and regard these functions as random variables on the set of genotypes. 
So the usual statistical functions are defined such as the mean; 


( 2 . 1 ) 


-I 


Vi 


I 


i,l 


p.p.S . . 
i l il 


and the variance; 


(2.2) Var(S) = J P i d i - 5) 2 or - S) 2 . 

i i, j 

Given two such random variables £ and T) we define their 
covariance : 

(2.3) Cov(S,T|) = ^Tp^i- f ) (T) i - ij) or ^P i P j (5 ij - SMTlij - ij) • 

1 j 

The historical bridge between the genetic statistics of a fixed 
population and the evolution problem is in the response of various 
metric traits to artificial selection. It becomes important to 
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determine the contribution of different loci or blocs of loci to the 

total effect as well as the interaction between the loci. For example, 

a character is called additive if the total effect is the sum of 

effects at the various loci. This means that the function £. from 

1 

a a . 

I to R can be written as a sum £ cp. where cp is a function on 

a 1 

a 

the alleles I at the a locus. A positive character is called 
multiplicative if its log is additive. In the case where is game¬ 
tic fitness* m^* additivity is also referred to as the absence of 
epistasis or zero epistasis . We will use the term epistasis to refer 
to interaction between the loci for any character under consideration. 
We formalize different types of epistasis. 

Let K be a nonempty collection of subsets of L* the set of 
loci* such that e K and S ^ a imply S 2 e K We will call such a 
collection a complex of loci or gene complex . If K is a complex of 
loci then we will say that a character f* regarded as a function 
5: I -> R* is carried by K or has K- type epistasis if there exist 

g 

functions cp : I g -> R for S e K such that 

(2.4) 5 i =^[cp S (i s ) : S e K). 

So a function | has K-type epistasis if it is the sum of functions 

(0) 

each depending only on a bloc of loci m K. For example* L con¬ 
sisting of the empty set and each single locus (i.e. 

= ( < 0,{l)*[2)*. .. *{j^)))is a complex called the zero-skeleton of 

(0) 

L. L type epistasis is just what we called zero-epistasis above. 

0 

Note that a function cp depending on none of the loci is just a con¬ 
stant. Similarly* f depends only on pairs of loci* or one-dimen- 


(1) 


consisting of 
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the sets in L^^ and all pairs of loci. In general, for s <; l -1 we 

(s) r 

can define the s -skeleton L = [S c L: S consists of s+1 or fewer 

( g ) 

loci). We will refer to L type epistasis as s -dimensional epis- 
tasis . The geneticist would say that such a character exhibits 
(s+1)-way interactions. 

If and K^ are complexes then the union, written V K^, 

and the intersection, written K 1 A are again complexes. If S is 

any bloc of loci (i.e. Sc L) then S together with all of its sub¬ 
sets is a complex which we will also refer to as S. One reason for 

s i 

this deliberate ambiguity is that if S, c S and : I -> F then we 

s i s i s i 1 

can regard $ as a function on I by cp (k) = cp (k ) for k e I . 

Here k is the projection to the subproduct I which just forgets 

1 s 1 

the part of the genotype not in S , So if cp : I •> R we can regard 
S 

S 1 

cp + tp as a function on I g . Thus,we can amalgamate together the 

functions on subsets of S to get just one function on i g . Doing 

this in formula (2.4) we see that ? has S-type epistasis if it can 

be written as a single function of i^. This means that f depends 

only on the loci in S. That is, variation of the genotype in the 

loci not in S has no effect on the value of the character §. This 

suggests a generalization of zero-epistasis different from s-dimen- 

sional epistasis. Suppose (T^: a = 1 ,...,j&'} is a set of pairwise 

disjoint subsets of L, i.e. each locus occurs in at most one set T . 

a 

Regarding each T^ as a complex we can form the union, as complexes, 

and so qet the disjoint bloc model T„ V...V T... A character shows 
-- - - £ • 

this kind of epistasis if it is the sum of effects each depending 

only on the loci in one of the blocs T^, i.e. it is additive between 

(0) 

the blocs. L is a disjoint bloc model where the T 's each consist 

a 


of a single locus. 
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One remark about language. A geneticist would use the term 
gene complex to refer to a collection of associated loci, in other 
words, to what I am calling a bloc. Mathematically, these blocs are 
the simplices of the complex K. 

in studying epistasis it is important to have a test to see 
whether a character g satisfies K-type epistasis. For example, 
when K = we are given a function g(i) = g(i^,...,i ) which we can 

think of as a function in i different variables and we want to know 
when g can be written as a sum: 

(2.5) | (i^,. . . , i^) = ^ 1 (i 1 ) + cp 2 (i- 2 ) +. . .+ cp 1 (i £ ) 

The variable i^ is discrete as it varies over the finite set 

1^. However, the answer to the question is easier when the variables 

i^ are continuous real variables. Consider the case when 1 = 2 and so 

g is a function of (i^i^) with i^ and i^ elements of F. Suppose 

that g is smooth meaning that all partial derivatives are defined 

1 2 

and continuous. Clearly, if g (i^,^) = cp (i^) + cp (i ) then the mixed 
partial derivative: 


ai l Si 2 


Conversely, if (2.6) holds then “r - doesn't depend on i and neither 

2 2 

does its integral with respect to i^ which we will call cp ^ 


and cp have the same partial derivative with respect to i^ and so: 



0 . 
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2 1 
Thus, g - ^ doesn't depend on and so is a function cp (i^) . This 

proves (2.5) from (2.6) in the case i = 2. A similar argument using 

mathematical induction on j i proves that (2.5) holds if and only if 

2 

(2.7) } = 0 for all a ^ p e L. 

In general, for smooth functions with j i real variables the 
analogue of K-type epistasis corresponds to the vanishing of various 

( g ) 

mixed partial derivatives. For example L -type epistasis corres¬ 
ponds to the vanishing of all s + 2-mixed partials. 

In the discrete variable case we will derive general formulae 
for detecting K-type epistasis in Chapter II. The basic tool in 
constructing the formulae is the discrete analogue of the partial 
derivative operator. 

So far we have made no use of the probability distribution p 
which weighs the points of I. It is used in the analysis of the 
variance of g. 

Suppose that g is a character which does show some epistasis. 

0 

We can ask: what is the best zero-epistasis approximation g to g? 

This means first, that g^ has zero-epistasis and, second, that the 
0 

mean of g equals the mean of g. The mean comes in because the 

mean of g is the best approximation of g by a constant. Third, 

0 . 

the variance of the "error" g - g is assumed to be smaller for the 
choice g^ than for any other choice of approximator satisfying the 
first two conditions. So we are using a least-squares notion of 
approximation. As we will see in the next section this sort of 
approximation arises naturally in linear algebra and from such 
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general considerations it follows that a best approximation always 
exists. It also follows that the variance of ? is the sum of the 
variances of and of the error f - So we can answer the ques¬ 
tion: how much of the variance of £ can be attributed to inter- 

0 

action between the loci? The answer is the variance of | - f . If 

most of the variance of f lies in ^ then we can throw away £ and 

use the approximation instead and thus suppose that the character 

is additive. How good the approximation has to be depends on the 

tolerances of the application at hand. If too much of the variance 

remains in the error, we can look to pairwise interactions and take 

the best approximation of g - which we call 5 ^. Then 

is the best approximation of 5 having only L ^type epis- 

0 1 ( 2 ) 

tasis. Continuing by approximating 5-5 -5 among L functions 

2 

we get f and so forth. The details of this partitioning of the 
variance of | into terms involving higher and higher interactions 
is a standard device in genetic statistics (see for example 
Kempthorne [20 Chaps. 13 and 19]). It would clearly be useful to 
have a general formula for the best K-type epistasis approximation to 
|. In an important special case such a formula can be derived using 
the discrete partial derivative operators mentioned above. We carry 
this out in Chapter II. The special case is when the loci are in 
linkage equilibrium meaning that the different loci are probabilistic¬ 
ally independent. Equivalently, the distribution p on the product 
set I is just the product distribution obtained from the marginal 
distributions p 06 on the factors I . This is equivalent to the formula 

P i = PC 1 !” ' ’ ’ijj) = P 1 ( i i) - P 2 (i 2 ) • • • • -p A (i^) • 


( 2 . 8 ) 
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The set of distributions in linkage equilibrium is a subset A of the 
set A of all distributions. Shahshahani calls A the Wright mani¬ 
fold and we will meet it again. For now notice that if we think of 
p as a metric trait, it is after all a real-valued function on i, 
then if p is in A, (2.8) implies that p is multiplicative, i.e. 
the log, j£n p^, has zero-epistasis. The converse is true and we will 
see later than this partly accounts for the key role of A in the 
mathematics. 

The projections of % to its approximations are not so nice 
if the distribution is not in A- This is one reason why the text¬ 
books tend to assume linkage equilibrium. 

3. Euclidean Vector Spaces and Riemannian Manifolds . 

n i n 2 

Suppose that f is a function from F to F * a list of n 
real functions of n^ real variables. More generally, suppose that f 
is a function between vector spaces and , a vector-valued 
function of a vector variable. What then does differentiation mean? 
What is the derivative of f at a point x of the domain? Recall 
from a first course in calculus that for a real function of a real 
variable (n^ = n = 1) the derivative, f'(x), is a number. This mis¬ 
leads one from the general answer: The derivative of f at a point 
x is a function, but a linear function. It is the linear mapping 
which is in some sense (quite different from a least squares idea) 
the best approximation to f near x by a linear map. Looked at 
this way, the purpose of calculus is to convert problems about non¬ 
linear functions to problems about linear ones (see Palais [27, Chap. 
1]). In short, calculus is generalized linear algebra. So before 
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discussing manifolds, which are places where one can do calculus, we 
first review some ideas from linear algebra. 

A real vector space or linear space is a set whose elements are 
called vectors together with a definition of addition of vectors and 
of multiplication of vectors by real numbers (also called scalars). 
Addition and multiplication are required to satisfy certain standard 
axioms. The most important example is R n , the set of ordered n-tuples 
of real numbers with coordinate-wise addition and multiplication. 


(x 1 , . 


,x n ) + (y r . 


'V 


(x i + ?1 .x n + y n ) 


(3.1) 


t (x 


r ■ 


9 X n } 


= (tx 


V ’ 


9 tX n } 


Most of the examples we will meet are subspaces of some R n . A 
subset of a vector space is a subspace, i.e. is a vector space in its 
own right, if it is closed under addition and scalar multiplication. 
For the three dimensional space R^ the subspaces, other than the 

3 

trivial extremes of R itself and the set consisting of 0 alone, 
are the lines and planes which contain 0. Notice that a line or 
plane which does not contain 0 is not a subspace. It is neither 
closed under addition nor under scalar multiplication. 

The axiomatic viewpoint is important even with these examples 
because it is used to construct new vector spaces. For example, the 
set of linear maps between two vector spaces is itself a vector space. 
A linear map T: is a function which relates the vector space 

operations: 
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(3.2) 


T(f + 71) = T(?) + T (7)) ?,71 e V L 


T(t-S) = t-T(|) 


? e V and t e F. 


Here the operations on the left are occurring in and those on the 
right are in V^. These linearity properties are very special. For 
example, the false assumption of linearity underlies many mistakes in 
elementary algebra, eg. ,/x+y = Jx + /y (false). The set of all 
linear maps between and V^, denoted L(V^,V 2 ), becomes a vector 
space when we define addition and scalar multiplication by: 


(3.3) 


(t l + t 2 )(?) = T 1 (5) + t 2 (?) 

(t.T) (?) = t(T(?) ) 


(t x ,t 2 g l(v 1 ,v 2 ), % G v x ) 

(T g L(V 1 ,V 2 ),g G V x , t g F) 


Here the operations on the right are in V 2 and are defining the linear 
maps + T 2 and t*T by describing their value on a typical element 
£ of V^. It is a good exercise to show that + T 2 and t.T so 
defined are linear maps, i.e. they satisfy (3.2). 

Actually, this definition of addition and multiplication for 
functions comes directly from (3.1). We can regard an n-tuple 
(x^,...,x ) as a function x: (l,...,n) -> F with x(i) = x^. In gen¬ 
eral, if I is any set and F 1 is the set of all functions from I 
to F we define: 


(3.4) 


(x + y)(i) = x(i) + y(i) 
(t.x) (i) = t.x(i) 


I 

x,y e R , 


i e I 


x e F 


i G I, t G F. 


When I is the set {l,...,n} this definition coincides with (3.1). 
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The most important space of linear maps is the dual space of a 
vector space V also called the space of linear forms on V. The 
dual space, denoted V*, is L(V,R). It is the space of linear maps 
from V to the reals. If § € V and u) e V* then the value of u) at 
£, i.e. is also denoted <u),£> and is then called the Kronecker 

product . So 

(3.5) <u>,5> = a>(5) a) € V*, ? € V. 

Regarded as a function of two variables, u> and f, the product < , > 
is bilinear, that is, it is linear in each variable alone with the 
other held fixed. 

The linear operations allow us to construct new vectors from 
1 n 

old. If 5 ,...,£ is a list of vectors in V and x,,...,x is a list 
* I n 

of scalars then the vector £ = x,£^ +...+ x £ n = £. x.£^" is called 

1 n i i 

the linear combination of the vectors ,...,£** with coefficients 

x^,...,x . The list of vectors is called linearly independent if we 

can equate coefficients, that is, if £ x^ 1 = 2 y^£ 1 implies x^ = y^ 

for i = l,...,n. So we can translate a vector equation into a list 

3 12 

of scalar equations. If £ =5 + £ , for example, then 

12 3 

{£ ,5 , § } is not linearly independent, and so is called linearly 

12 3 12 3 

dependent, because 1*5 + 1«£ + (-1)*£ = 0-£ + 0.£ + ()•£. If 

every vector in V is a linear combination of the linearly indepen- 
1 n 

dent set {£ , . . . , £ }, so that the list also spans V, then we say 
that the set is a basis for V. The structure theorem of linear 
algebra says that while a vector space has in general many different 
bases, any two bases have the same number of elements. This number 
is called the dimension of V. If is any list of vectors 
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in V 


then there is a linear map T: 


-> V defined 


by: 


(3.6) 


T (x 


r * 


,X n ) 


-I 


x i? - 


If ,? n ] is a basis then this map is onto, meaning that every 

vector in V is in the image of T, because the set spans. It is 
one-to-one, meaning that no two lists of coefficients hit the same 
vector in V, because the set is linearly independent. A one-to-one 
and onto linear map T: -> i- s called a linear isomorphism. If 

T is a linear isomorphism then the inverse map T V 2 ^ V 1 ^ e_ 
fined and is also linear. For the special case defined by (3.6) with 
(§1,.•.,§ n ) a basis for V this inverse map associates to every 
vector ? in V the list of coefficients (x^,...,x^) such that 
| = S x^ 1 . The scalars (x^,...,x n ) are then called the coordinates 
of § with respect to the basis. A different basis will in general 
give different lists of coordinates for the vector f. In general, if 
the vectors of a basis are indexed by a set I, then the above con¬ 
struction gives an isomorphism of F 1 with V. On p 1 itself the 
standard basis (e 1 ; i € i] is defined by letting e 1 be zero at all 

points of I except i at which e^ is one. So if we define the 

Kronecker delta 6 .. by 6. . =0 if i ^ j and 6.. =1 then the basis is 

ij ID J n 

defined by: 

(3.7) e 1 (j) = 6 ±j . 

We call it the standard basis because the coordinate map F 1 p 1 is 
the identity, i.e. x = E x(i)e 1 . 

These coordinate maps show that all vector spaces having a 
finite basis, the so-called finite dimensional vector spaces, are 
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just copies under some isomorphism of R n where n is the dimension 
of the space. 

If . . J n ) is a basis chosen for and ( 7]\ . . . 7]™) is a 

basis chosen for then this choice of bases associates to every 
linear map T: -> an m x n matrix ( a —) (i = 1, . . . ,m and 

j = l,...,n) satisfying the equation 


=V a. .x . 
1 ID D 


i = l,...,m 


whenever (x.,,...,x ) are the coordinates of f in V_ with respect to 
In I 

and (yi»---^y m ) are th e coordinates of T(§) in with 
respect to (7]^, . . . , T] m ). a^ is defined by letting the j ^ column 
a _ , i = l,,.,,m be the coordinates of T(§^) with respect to the 7] 
basis. The composition of two linear maps T^T^: v i w ^ ere 

-> and T^: -> Vis again a linear map and the matrix of 

the composition T 2 °T^ is the product in the same order of the two 
matrices for and T^. In fact, this is the reason behind the odd 
definition of the product of matrices. 


In addition to the algebraic concepts of addition and multi¬ 


plication we need a definition of the distance between vectors before 
the limit operation in calculus or any concept of approximation makes 
sense. We make the definition by using a Euclidean metric or inner 
product on a vector space V. This is a function ( , ) : V x V R, 
i.e. a real-valued function of two vector variables. ( , ) is 


bilinear and symmetric (i.e. (£,7|) = (7],§)). Furthermore, it 

satisfies: 


(3.9) 


(5,5) >0 if 5 e V and ? ^ 0. 
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Notice that, because of bilinearity, (0,5) = (5,0) =0 for any vector 
5 in V. 

This allows us to define the length, or norm, or absolute value, 
of a vector by 

O.io) !!?!! = (?,§) 1/2 . 


By analogy with the real numbers we define the distance between £ 
and T] to be the length of the difference, JJ ^ — T]!| = ||T]~5||. 

On the space R 1 there is the so-called usual inner product ; 


0 . 11 ) 


(§,*n) 


= X ? iV 


More generally, if p is a distribution on I, i.e. p e A, then we 
can define the covariance metric: 


(3.12) p (5,T)) = 

( , ) is symmetric and bilinear but satisfies (3.9) only if p is 

P 

an interior distribution, meaning that p^ > 0 for all i. So for 

(3.12) to define a Euclidean metric we must have p e A = 

(p e R 1 : E^p^ = 1 and p^ > 0 for all i e i). 

The inner product gives more than just the length. For any 
inner product ( , ) on a vector space V: 

(3.13) (? >ti) = ||?!!- lit)!! -cos e 

2 3 

where 0 is the angle between the two vectors. In R or R with 
the usual inner product this is a theorem of trigonometry (the law 
of cosines). For a general vector space equipped with a fixed 
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Euclidean metric—we will call such a space a Euclidean vector space — 

(3.13) is used to define the angle 0. Then by using bilinearity to 

2 

expand |j|+T)!! = (S+Tbl+T)) we get the law of cosines as a theorem: 

(3.14) || 5 + Till 2 = II? II 2 + l|T)|| 2 + 2 (§ , T)) 

= llfll 2 + I!Till 2 + 2||?||||n||cos 0 . 

It is a theorem that (? , T))/||| \\ • !|T)|| always has absolute value 
at most 1 (Schwarz inequality) and so it makes sense to regard it as 
the cosine of an angle. In particular, this angle is a right angle 
if and only if the cosine is zero. So 5 and T) are perpendicular, 
or orthogonal, if and only if (?,T|) = 0. With respect to the usual 
inner product on R 1 distinct members of the standard basis are orth¬ 
ogonal. Furthermore, the length of each basis vector is 1. We can 
summarize this by saying that for = e 1 (i € I): 

(3.15) (5 1 ,? 3 ) = 6 ± j i,j e I- 

In general, in a Euclidean vector space a basis which satisfies (3.14) 

is called an orthonormal basis. For example, with respect to ( , ) 

P 

the basis = p^^e^} (i € I) is orthonormal. A general procedure 

called Gram-Schmidt orthogonalization process constructs an orthonor¬ 
mal basis starting from any basis. 

If the basis of V is orthonormal then the linear isomorphism 
from R 1 to V defined by the basis and equation (3.6) is also an 
isometry with the usual metric on R 1 and the given metric on V. A 
linear map T: -> between Euclidean vector spaces is called an 

isometry if it preserves the metrics: 
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(3.16) (S,T1) 1 = (T(g),T(71)) 2 §,71 e V r 

An isometry preserves length and distance and so is one-to-one. It 
is an isomorphism if it is onto. In that case the inverse map is 
also an isometry. 

Since an orthonormal basis always exists we see that every 
finite-dimensional Euclidean vector space is isometrically isomorphic 
to R n with the usual metric where n is the dimension of the space. 

Every linear map T: F -> V can be naturally identified with a 
vector § in V, namely, § = T(l) because T(t) = tT(l) =t§. This 

gives a linear isomorphism between L(F,V) and the space V itself. 

Using the inner product we can get a—quite different—iso¬ 
morphism between V and the space of linear maps from V to F, i.e. 

the dual space V*. Every vector 7] € V defines a linear form 

7]* s V -> F via the inner product, namely 7]*(§) = (7],§). The associa¬ 
tion of 7]* with 7] defines a linear map of V into V* by bilinearity 
of ( , ) . It is easily seen to be one-to-one because if 7]* = 0 then 

7]* (7)) = (7), 7]) = 0 and so 7] = 0. The Riesz representation theorem says 
that this map is onto and so defines a linear isomorphism between V 
and its duals 

1 Theorem: Let V be a finite dimensional Euclidean space. For 
every linear form u>: V -> F there exists a unique vector 7] e V such 
that u>(§) = (7],§) for all § e V. 

Proof : Choose an orthonormal basis (f 1 ) for V. With respect to 

this basis, and the number 1 chosen as a basis for F* u> is repre¬ 
sented by a 1 x n matrix. These n numbers are the coordinates of 
7] with respect to the £-basis. In more detail, if the matrix is (a^) 
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then u)(?) = £ a^x^ where [x^} are the coordinates of 5 with respect 
to f? 1 }* i.e. 5 = £ Define T| = 2 a^l* 1 . Then by (3.15) 

(7|,5) = £ a ^ x j (5) = £ a -L x j S ^j = 2 a i x i = u>(5). So this 7| works^ 
i.e. 7|* = u). It is the only one which does because the map 7| -> 7]* is 
one-to-one. QED 

This simple result has many profound consequences. For the 
moment, we will use it to define the least squares approximations 
which we used in the previous section. 

2 Theorem ; Let V be a finite dimensional Euclidean space and A 
be a subspace of V. If § is a vector in V there is a unique 
vector 5 in A satisfying the following equivalent conditions: 


(i) 

(T],?) = (T) S ? A ) for all 

7] in A. 


(ii) 

5 - § a is orthogonal to 

every vector 7] in 

A. 

(iii) 

For every vector 7] in 

A the Pythogorean 

identity holds 


(3.17) ||? - ^|| 2 = II? - ? A || 2 + ||s A - Til! 2 . 

In particular, if 7] = 0 then we have: 

(3.18) II;I! 2 = II; - ; A I| 2 + ||; A I| 2 . 

Proof: 5* is the linear form on V defined by §* (T]) = (7|,5). Res¬ 

tricting to A we get a linear form on A and so by Thm. 1 there 
is a unique vector § A in A such that £*(7|) = 1^(1]) f° r all T] in 
A. This proves that a unique 5 a satisfying (i) exists. Since 
(7],5) = (T1,! a ) if and only if (71,5 - ? A ) = 0, (i) is equivalent to 

(ii). (ii) implies (iii) by the law of cosines (3.14) applied to 
5 - 5 aJ , ? a - T] and their sum. Conversely, if (iii) holds we can 
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replace 7] by the vector 7] + g^ in (3.17) and get 

ll(? - s A ) - Till 2 = II? - s A ll 2 + IIti!! 2 . 

Applying (3.14) to 5 - g , 7] and their difference we get 
(5 “ 5 a ,T|) = So ( ii:i -) implies (ii) . QED 

Define the function P^: V -> A by P (g) = ? A * Using (i) it is 
easy to check that is a linear map and that P (g) = g if g lay 
in A to begin with. P^ is called the orthogonal projection of V 
onto A. 

Equation (3.17) explains the sense in which g^ is the best 

approximation of g by a vector in A. 

When V = R 1 with the usual inner product g^ is the usual least 

squares approximation. When the inner product is ( , ) with p e 

P 

2 

||g|| the variance of g plus the square of the mean of g. So 
(3.18) justifies our remarks about partitioning variance in Sec. 2. 

Now we describe the elements of advanced calculus as they 
appear in modern texts like Edwards [ 8 ] or Spivak [32]. 

Suppose U is an open subset of a Euclidean vector space V^. 
This means that whenever a point x e U then all points sufficiently 
close to x also lie in U, i.e. there exists 0 > 0 depending on x 
such that ||h|| < e implies x + h e U. Let f be a function on U 
with values in a Euclidean vector space V^. So f: U -> V^. The 
derivative of f at a point x e U is a linear map written 
d f: -> V . It is the unique linear map such that the function 

f(x) + d^f(h) (with x fixed and h varying) gives the best approxi¬ 
mation to f near x, i.e. to f (x + h) . This means that not only 
does the error term = f(x + h) - f(x) - d^f(h) approach 0 as h 
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approaches 0 (and so x + h approaches x), but the ratio between 
the error term and the length of h also goes to zero. We write 
this as follows: 

(3.19) f(x + h) = f(x) + d^f(h) + o(h) 

where the error term denoted o(h) is defined for ||h|| sufficiently 
small and satisfies: 

(3.20) llodDll/Uhl^-> 0 as llhj^->0. 

We will usually drop the subscripts on the length which here 
are reminders of which Euclidean metric (whether in or V ) is 
being used. 

The derivative of a function need not exist,, for example, 

f(x) = x 1/3 defined from R to R is not differentiable at x = 0, 

but unless otherwise mentioned all of the functions we will look at 
00 

are smooth or C meaning that all derivatives exist and are continuous. 

When = R n and = R™ then with respect to the standard 

bases the derivative d^f can be represented by an m x n matrix. This 

matrix is just the Jacobian matrix of partial derivatives. If 

f(x) = (f_ (x ),...,f (x)) and x = (x_,...,x ) then the matrix a.^ is 
1 m in lJ 

given by a_^ = df^/dx^ (i = 1*...^m and j = 1,...,n). 

Taking the derivative itself as a linear operation. If 

f,g: U -> V and t e R then d (tf + g) = t(d f) + (d g) . So in the 
2 . X XX 

standard case the Jacobian matrix of the sum of two functions is the 
sum of the corresponding Jacobian matrices. We will also need the 
chain rule which says that the derivative of a composite map is the 
composition of the derivatives. If f: -> and g: v 3 then 
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the composite function is g°f: -> defined by g©f(x) = g(f(x)) 

for x € V . Now for x in we can take the derivatives of f and 

g°f at x and the derivative of g at f(x). We get linear maps 

d f: V_ -» V , d_. x g: V_ -> V_ and d (g©f): V_ V_. The chain rule 
x 1 2’ f (x) 2 3 x' 1 3 

says: 

(3.21) d x (g.f) = ( d f (x )^ )o ( d x f )* 

In the standard case this implies that the Jacobian of the composite 

g©f is the product of the Jacobians of g and of f. 

When = F and = V so that f(t) is a vector-valued function 

of a real variable, d^f is a linear map from F to V. We saw 

earlier that such a map can be identified with the vector d^ffl) and 

we denote this vector f' (t). So d^_f(s) = sf' (t) . f' (t) is the limit 

of the familiar difference quotient (f(t + s) - f(t))/s as s 

approaches 0. On the other hand, when = V and = F* d f is a 

linear form on V called the differential of f at x. If f: U -> F 

then the differential df: U -> V* associates to x, the form d f. 

x 

Now if we use the Euclidean metric on V, the Riesz representation 
theorem (Thm. 1) associates to d^f a vector in V. This is the 
gradient of f at x denoted grad^f. It is defined by: 

(3.22) d x f(h) = (grad^f,]!). 

The gradient depends on the particular Euclidean metric on V. Up 
to now we have only needed the metric to make the limit statements 
like (3.20) make sense. But any Euclidean metric will give the same 
idea of limit, the same topology, on V. So the derivatives like d^f 
are independent of the choice of metric. This is not true of the 
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gradient and we will later see different kinds of gradients. 

If h is a vector of unit length in V, then d^f(h) is called 
the directional derivative of f in the direction h. It is the 
limit of the difference quotient (f(x + sh) - f(x))/s as s 
approaches 0. By (3.22) and (3.13), d^f(h) = Ugrad^f||cos 0 where 0 
is the angle between h and the gradient. Clearly, this is largest 
when cos 0=1, i.e. 0=0. So the gradient has the direction of 
greatest increase of the function f. 

In general, the derivative d^f describes the behavior of f 
near x. So calculus is used to solve local problems. For example, 
if f: F -> F then f' (t) = 0 and f" (t) < 0 implies f has a local 
maximum at t, i.e. f(t) > f(s) for s different from but close to 
t. It may happen that far from t f becomes larger than f(t). 

The most important example of a problem which can be solved 
locally by calculus is described by the.inverse function theorem. 

Suppose is open in and fs U -> V . f is called a 
diffeomorphism if it has a smooth inverse map, i.e. if f maps 
one-to-one and onto an open set and the inverse map f ^ 
is smooth. When f is a diffeomorphism and x e then the chain 
rule implies that the linear map d^f is a linear isomorphism and that 
its inverse is the derivative of f taken at f(x). Thus, if f is 
invertible so is its derivative at each point. The inverse function 
theorem is the converse, at least locally. For the proof see 
[8, p. 185] or [32, p. 35]. 

3 Theorem : Let f: U -> be a smooth map with U open in V and let 

x e U. If the derivative d^f: -> is a linear isomorphism then 


f is locally a diffeomorphism near x, i.e. there exists an open set 
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c U with x e and f restricted to is a diffeomorphism. 

If the set I has n elements and k is a \tfiole number with 
k n, then a k-dimensional manifold in the vector space F 1 is a sub¬ 
set M of F 1 which looks locally, near each point, like a curved 
piece of a k-dimensional subspace. There are two equivalent ways of 
making this precise. 

First, we can define M near x e M explicitly by defining a 


coordinate 

system on 

M 

near 

x. This is a 

function 

h: U -» 

M where 

U is open 

_k 

m F and 

h 

maps 

U one-to-one 

and onto 

all of 

the 


points of M near x (i.e. the intersection of M with some open 

set in R 1 ), h is assumed to have rank k. This means that if we 

I k I 

regard h as a function from U to F the derivative d h: F -> F 

u 

is one-to-one at every point u of U. This description is called 

explicit because it parametrizes the points of M near x by k 

real parameters. For example, the piece of the circle of radius 1 

2 

in the interior of the first quadrant of F is the image of the func¬ 
tion f(t) = (cos t, sin t) with t varying in the open interval of 
F between 0 and rr/2. Similar pieces can be constructed near any 
point of the circle. This example illustrates the typical fact 
that often no coordinate system can be found which works on the 
entire manifold. The manifold is obtained by gluing together many 
coordinate patches. 

The implicit description of the manifold near x is as the 

level surface of a family of functions. This means we have a function 

n —k I 

F: G -> F , with G some open subset of F containing x, such that 

the points of the manifold in G are precisely the solutions of the 
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n-k 

equations F(y) = £ for some fixed vector § in F , i.e. 

M fl G = F ^ ([5}) . F is assumed to have rank n - k at all points of 

X n-k 

M fl G. This means that the derivative d F: F -> F is onto for 

y 

every point y in M fl G. We can think of F as a list of n - k 
scalar functions and the equation F(y) = g as a 1 st of n - k con¬ 
straints , which reduce the number of degrees of freedom 
(= dimension) from n to k. Frequently an implicit descrip¬ 
tion can be given for the entire manifold. For example, the n - 1 
dimensional sphere of radius r in F* is given by the single scalar 

2 2 o 1 

equation F(y) = r where F(y) = E y^. The subset A of F is defined 

by the equation F (y) = 1 where F(y) = £ y^. Here the open set G 

consists of the set of vectors with positive coordinates. 

Just as the derivative at a point of a function is a linear 
approximation to the function, there is at every point of a manifold 
a linear subspace which approximates the manifold. 

A path through x in M is a function v from an open inter¬ 
val in F to M such that v(t) = x for some t in the interval. 

Taking the derivative at t we get the vector v' (t) which is called 
a tangent vector at x. The collection of all tangent vectors at x 
is a linear subspace of F 1 are called the tangent space of M at x 

and denoted T M. It is not clear from this definition that T M is a 
x x 

subspace, but T^M can also be defined using the explicit or implicit 

description of M near x. If h: U -> M with U open in F is a 

coordinate system near x then every path in M through x can be 

described using these coordinates. It then follows from the chain 

rule (3.22) that T^M is the image of the linear map d^h. Since d^h 

is one-to-one T M is a k dimensional subspace of F*. On the other 
x 

n-k -1 

hand, if F: G -> F with M fl G = F ([§)) then every path in M maps 
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under F to a constant path in R . Since constants have derivative 

I n-k 

0, T^M is the kernel of the linear map d^F: R R , i.e. 

T x M = (y e R 1 : d^F(y) = 0}. For example, if F(y) = £^y_^ then 

0 I 

d F(y) = Z.y. and so the tangent space T A = (y e R : H,y, = 0) for 

X 11 p 11 

0 ‘ 2 
all p in A. On the other hand, if F(y) = £^y^ then 

d^FCy) = 2 E^x^y^ = 2(x,y) where ( , ) is the usual inner product. 

So for the sphere of radius r, the tangent space at x consists of 

o 

all vectors orthogonal to x. Notice that T^A is the same subspace 
for all p, but the tangent space of the sphere at x changes as x 
changes. 

T 1 Z 2 

If is a manifold in R , is a manifold in R and f is 

a function from to we can extend the definition of f to a 

I 2 I 1 
function from U to R where U is some open set in R containing 

. . h X 2 

M,. Then for x in M, we can define the derivative d f: R -* R 
11 x 

There are many different ways of extending f and d^f will depend 

on which extension is used. However d f maps T M, into T r< . 1XL and 

x x 1 f (x) 2 

this part of d^f does not depend on the choice of extension, so we can 

define the linear map d f: T M, -» T_. V 1XL without ambiguity. The 

^ x x 1 f (x) 2 

reason is that if v is a path in through x then the composition 

fov is a path in through f(x), and (f«v) 1 (t) = d^ffv 1 (t)) by the 

chain rule. This allows us to do calculus on manifolds. For example, 

if d f is a linear isomorphism of T M_ onto x lXL then one can 

x r x 1 f(x) 2 

extend the inverse function theorem to show that f is a diffeo- 
morphism between some open set of containing x and some open set 
of containing f(x). 

I 2 

In particular, if R = = R then the differential of f, df 

associates to each x e M, the linear form d f on T M_. 

1 x x 1 

Dual to the idea of the differential of a function is the idea 
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of a vectorfield. A vectorfield X on an open set U of R 1 is just 
a function X: U -> R 1 . A vectorfield on a manifold M in R 1 is a 
function X: M -> R 1 such that X(x) e T^M for all x e M, i. e. X is 
always tangent to M. Via the Kronecker product, eg. (3.5), we can 
associate to a function f: M -> R and a vectorfield X on M a new 
function <df,X> defined by: 

(3.23) <df ,X>(x) = <d^f,X(x)> = d^f(X(x)) x e M. 

With f fixed we can regard (3.23) as a way that functions operate 
on vectorfields to get new functions, or with X fixed we can regard 

(3.23) as the way a vectorfield operates on functions. From the 
latter viewpoint we define the vectorfield on R 1 to be constantly 
the standard basis vector e 1 . The notation comes from the fact that 


<df,d.> = ^ at each point x of R 1 . 

i ax, 

This is because <df,d^> is just the directional derivative in the e 1 

direction. Since (e 1 } is a basis every vectorfield X on M can 

be written uniquely as a linear combination X = Z X.d. where each X. 

ii i 

is a real-valued function on M. Note that the b .'s themselves 

i 

usually do not lie in T^M and so not every choice of function X^ will 
define a vectorfield on M. 

The Kronecker product is bilinear and so 


(3.24) 


<df,X> = ^x.<df )a .> = Vx. 


Note here that as the d^'s are not vectorfields on M the expres¬ 
sions df/dx^ will depend on the choice of extension of f to a 



35 


neighborhood of M. However, <df,X> itself does not depend on this 
choice. 

A vectorfield on M is the manifold analogue of a differential 
equation. A solution path for the vectorfield is a path v(t) in M 
such that for all t: 

(3.25) v' (t) = X (v (t) ) . 

In local coordinates it is easy to check that this is just an ordi¬ 
nary differential equation in the k coordinates. 

An important example of a vectorfield, a gradient field, 
requires the notion of a Riemannian metric. 

Any inner product on R 1 restricts to an inner product on each 
subspace and in particular on the tangent spaces of a manifold M. 
However, in many applications including those of this paper the inner 
product which arises naturally from the problem will be different at 
different points. A Riemannian metric on a manifold M is a smooth 
choice of inner product ( , for each subspace T^M. A manifold 
equipped with a Riemannian metric is called a Riemannian manifold. 

On a Riemannian manifold M we define the gradient vf of a 

function f: M -> R. dfisa linear form on T M and so by Thm. 1 

x x 

there exists a unique vector v^f e T^M such that: 

(3.26) ( v f,X) = <d f,X> = a f(X) x € M, X € T M. 

X X X X X 

The vectorfield vf thus depends not only on f: M -* R but also on the 
Riemannian metric. This is in contrast to df which depends only on f. 

If is a submanifold of M, i.e. another manifold in R 1 with 
c M, then for x e the tangent space T^M^ is a subspace of T^M 
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since every path in lies in M. So a Riemannian metric on M 

restricts to define one on If f: M -> R then f restricts to a 

function f|M n : -» R. Now for x € M n the two gradients v f and 

'll 1 x 

V (fIM_) both satisfy (3.26) for vectors X € T M_ . In addition, 
x 1 1 x 1 

(f | M^) itself lies in T^^. So by Thm. 2 ^(ffM^) is the perpen¬ 
dicular projection of ^f into T x M^. For a connected submanifold, 
i.e. every pair of points in can be joined by a path in M^, this 
implies the following: 

3 Proposition : Let be a connected submanifold of a manifold M 

and let f: M -» R. The following conditions are equivalent: 

(i) f is constant on M^. 

(ii) d f(X) = 0 for all X e T M n , x e M_ . 
x x 1 1 

(iii) ^7^ (f | ) = 0 for all x e . 

(iv) v x f in T x M is orthogonal to the subspace t x m -l f° r a H 
x € M^. 

On a Riemannian manifold the arclength of a path is defined. 
Thinking of t as time the velocity vector at time t of the path 
v(t) in M is v' (t) € T^^M. The speed is the length of the velo¬ 
city measured using the inner product at v(t). So 

1/2 

||v' (t) || = (v 1 (t) v' (t))^' j. Integrating the speed we get the length 
of the path. This enables us to define the distance between two 
points x^ and x^ of M as the greatest lower bound of the lengths of 
all paths in M connecting x^ and . A path which achieves this 
length—the shortest distance between the two points—is called a 
geodesic . For example, on the sphere with the Riemannian metric 
obtained by restricting the usual inner product on R 1 , geodesics are 


pieces of great circles, i.e. the intersection of the sphere with two 



37 


dimensional subspaces (planes through zero). 

A diffeomorphism between two Riemannian manifolds and is 
called an isometry if d^g: T X M ^ T g( x ) M 2 an ^- sometr y at every 
point x of M^. While diffeomorphisms between pieces of manifolds 
of the same dimension are quite common, isometries are quite rare. 

This is related to the rigidity of geometry as opposed to the flexi¬ 
bility of topology ("rubber-sheet geometry"). Isometries preserve 
the structures of Riemannian geometry. They map geodesics to geode¬ 
sics and relate gradient vectorfields, i.e. d g(n (f»g)) =7 .f if 

x x g (x) 

g is an isometry but not, in general, if it is not. 

4. The Shahshahani Metric 

Fisher's Fundamental Theorem of Natural Selection says that 
along the solutions curves of the selection differential equation, 
(1.4), mean fitness, m, is constantly increasing. Kimura's Maximum 
Principle says that the direction of motion is the direction of great¬ 
est increase. These results suggest that the selection vectorfield on 

o 

A associated with the selection differential equation (sensu (3.25)), 
should be the gradient of m. When one computes the gradient one 
gets the wrong equation. However, we saw in the previous section 
that the concept of the gradient of a function depends upon a choice 
of Riemannian metric. When we compute grad m we are using the metric 
obtained from the inner product on R 1 . A careful statement of 
Kimura's theorem, eg. see [6, p. 230] shows that the concept of 
direction, i.e. unit vector, means unit variance. Since the defini¬ 
tion of variance depends on the distribution p it becomes clear 


that we should look for a non-constant Riemannian metric on A- The 
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appropriate Riemannian metric was discovered by Shahshahani in [28] . 

As we will see this metric plays a central role in interpreting selec¬ 
tion, recombination and mutation geometrically. 

For I a set of n elements define the subsets of R 1 : 

P = {x: x^ 0 for all i 0) and P = {x: x_^ > 0 for all i). Then 
A = (p e P: E p i = (p,l) = l) and A = A H P = fp e P: Z p^ = (p,l) =1}. 
Here ( , ) is the usual inner product on p 1 and 1 is the function 

on I constantly 1, regarded as a vector in P*. P, as an open sub¬ 
set of p 1 , is a manifold with tangent space p 1 at every point. Recall 

that a is a manifold of dimension n - 1 whose tangent space at every 

I 0 

point is the subspace (R ) = [X: Z X^ = (X ,1) = 0). For x e P we 

I 0 

define the Shahshahani metric ( , ) on P = T P by: 

-x x 


(4.1) 


(X,Y) 


Z X . 1 X. Y. 

1 l l 


X, Y e R , X € P. 


In particular, regarding x itself as a vector in R we have 


(4.2) 


(X,x) x = (X,l) 




So for p e A, (p,p) = 1 and T A = (X e P : (X,p) = 0). With respect 

P P P 

- 1/2 

to the usual inner product the constant vector n 1 is the unit 

vector in p 1 perpendicular to T A* But with respect to ( , ) it is 

P P 

p itself which is the perpendicular unit vector. 

o 

The Shahshahani metric restricts to a metric on A. Here if 


we think of X e T A = (P ) n as a little change in the distribution p, 
we can write the square of the magnitude two ways: 

2 


M 


o V’ x - v— X. _ 

i-Itr K> 2 - 


(4.3) 
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In the first form we weight the square of by the inverse of 
p^ because we regard changes of smaller values of as more signi¬ 
ficant. In the second form, we regard X^/p^ as the relative change 

and average the squares of the relative changes by the distribution. 

2 

In the first form this is referred to as the x measure of 
genetic distance (see Jacquard, [17, p. 427] or Kempthorne, [20, p. 
178]). This suggests an interesting coordinate change which among 
other applications reveals the relationship between two measures of 
genetic distance. 

1 Theorem : The smooth map f: R 1 -» P defined by f(z) = x with 

2 o o ° 

x^ = z^/4 admits a smooth inverse on P g: P -* P defined by g(x) = z 

with z^ = 2*/x^. g is an isometry between P with the Shahshahani 

o 

metric and P with the usual Euclidean metric. If 

I 2 — l 

= [z e R : = 4] is the sphere of radius 2 then f (A) = 

o • 

and so g restricts to an isometry between a and fl P. 


Proof : For x e P the derivative is given by d^g(X) = r with 

r. = X./v^T and so: 
i 


(d x g(x),d x g(Y)) = 


x. Y. 

- 


= (X,Y), 


QED 


o 

2 Corollary : If p,q e A then the geodesic distance between p and 

o 

q with respect to the Shahshahani metric on A is given by 


d(p,q) 



1 


Here the principal value of arc cos is chosen, measured between 0 
and tt/2 radians. This measure of distance is the arc measure of 


Cavilla-Sforza and Edwards (see Jacquard, [17, p. 425]). 
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Proof ; An isometry preserves geodesic distance and so the distance 

6 

between p and q in a is the same as the distance between g(p) 
and g(q) in S D P. But on the sphere geodesic distance is measured 
along the arc of great circles. This arc distance is the radius (= 2) 
times the angle between g(p) and g(q) measured in radians. The co¬ 
sine of this angle is just the usual inner product between the unit 

1 1 • 
vectors ~ g(p) and — g(q). Finally, since these vectors lie in S H P 

(and so does the arc between them) the inner product is positive and 

the arc cosine is between 0 and rr/2. QED 

Thus, the arc measure of genetic distance is obtained by inte- 

2 o 2 

grating the x metric along a geodesic in £. Alternatively, the x 

metric is the "infinitesimal" version of the arc cosine metric. 

The second sum in (4.3) has a differential equation intepre- 

tation. Let X = 2 X.d. be a vectorfield on P with the X.'s real 

ii i 

functions of x e P. By (3.25) the associated differential equation 
is dx/dt = X, or in coordinate form: 



dx. 



(4.4) 

_l 

dt 

= X i (i e I) . 


Thus, 

X^(x) is the absolute 

growth rate of x^ 

when the system is at 

state 

x. If we define 5 

by X. = x.|. or | 
i ii 

. = X./x., then: 
i i i 


dx. 

d in x. 


(4.5) 

~~7T = x. ? . 
dt ii 

~ « 1 ■ <i 

(i e I). 


So f^(x) is the relative growth rate of x^ when the system is at 
state x. Define |x| = T, x^ = (x, 1) to be the total population size 
and p^ = x^/|x| to be the distribution at x. The mean of £ is 
f = £ and so |x|| = £ x^. 


If we sum the left hand system in 
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(4.5) we get 


(4.6) 


= 

dt 


x S or 


d An | x | 
dt 


Subtracting the logarithmic derivatives we get: 


(4.7) 


dp. _ d to p. 

""dt" = p i (5 i " or dt ? i ” 


So normalized to mean zero is the relative growth rate of p_^. 
(4.3) suggests the relationship between the Shahshahani metric on 
absolute rates and the covariance inner product on relative rates. 


3 Proposition : For x e P„ let |x| = £ x^ and p = x/|x| e A. Define 

F : R 1 -> R 1 by F (§) = X with X. = x.?.. If F (O = X and F (Tj) = Y 
x J x i ri x ' x ' 

then 


(4.8) 


(X,Y) = (|, T)) • |x| 

X p 


where ( . ) is the Shahshahani metric on R = T P and (£.T]) 

x x p 1 

= £ In particular, if x = p then F^ is an isometry between 

R 1 with inner product ^( , ) and R 1 = T^P with the inner product 

( , ) . In this case, the mean f = (§,1) = (X,p) = (X,l). So F 
P P P P 

maps the vectors with zero mean onto the tangent space T^A- Further 
more,, normalizing the vector £ to mean zero corresponds to project 


ing the vector X e T^P to T^A orthogonally with respect to the 
Shahshahani metric. 


Proof: (4.8) and the mean equations are easy direct computations. 

The projection result says 


(4.9) 


F (? - ?) = X - (X,p) p = X - (X,l)p 
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which follows from F^(l) = P and the mean equations. The vector on 
the right is the orthogonal projection (see Thm. 3.2). QED 

o 

If f: P F is a smooth function then the gradient of f with 
respect to the usual inner product is: 


< 4 ’ 10 > 9 rad f 

For if X = E x.d., (grad f,X) = E X. = d f (X). For the gradient 

11 ox. 1 X 

1 

with respect to the Shahshahani metric denoted vf, we must have 
(7 f,X) = d f(X) and so: 

X X X 


(4.11) 7 f = \ x. ^— d. = F (grad f) . 

x / i dx. i x x 

c — i 

This means that a vectorfield on P is the gradient of f with 
respect to the usual metric if the absolute rate of change for x^ is 
the partial derivative df/fcx^. It is the gradient of f with res¬ 
pect to the Shahshahani metric if the relative rate of change for x^ 
is fcf/dx^. 

o 

For the restriction of f to A* Prop. 3.3 says that the 
gradient of f|A with respect to a is the orthogonal projection of 
the gradient of f with respect to P. Denoting this gradient yf 
or 7(f|A) we have from (4.9): 

(4.12) v f = v f - (v f,p) P = F (grad f - grad f). 

P P P P P P P 


We should mention that Shahshahani 1 s definition of his metric 
differs from ours by a factor of |x|. His original definition has the 
advantage that the selection vectorfield on P (and not just on a) 
is a gradient with respect to his metric. This is not true in our 
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case. We have chosen our definition to get Thm. 1 and (4.11). The 

0 

two definitions agree on a where most of our applications will take 
place. 

We conclude this section by defining and computing the gra¬ 
dients of certain special functions. In the table below, a,b e F^ 
and ]b| = (b,l) = 0. 


4 Table 


f (X) 

7 f 

X 

V f = 7 (f|A) 

P P 

E a (x) = £ x^a^ 

£ x. a. d . 

ill 

£ p. (a. - a) d . 

li l 

b 

L (x) = £ b. An x. 

l l 

£ b.d. 
i i 

£ b.d . 
i i 

H (x) = -£ x^ in x^ 

-£ x .(in x. + 1)d. 

li l 

-£ p i (An p ± - H(p))^ i 


The following inner product equations will be useful: 


(7 E a , 7 E a ) = £ x . a. a . 

XXX 111 


(4.13) 


( V a ’V% = CDVpU.,^), 
( 7 x Ea ’ v x Lb) x = 2 a i b i = (a ’ b) 
(v x H, 7 x Lb) x = _I,b(x) - 


5. The Product Theorems and Epistasis 

a a 

Recall the map E : A -> A which associates to a distribution 

a 

p on the product I the marginal distribution p on the factor I 
(c. f. equation (0.1)). Taking the product we get a map E = n^E 0 ^: A -> 
n a A a associating to p the list of marginal distributions. E maps 
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° *CL 

A onto n A i.e. the marginal distributions of an interior distribution 
CL 

0 CL 

are interior. The dimension of n A is E (n - 1) = (£ n ) - i which 

a a a a a 

is usually much smaller than the dimension of A which is 
n-1 = (n^n^) - 1. So for any list of marginal distributions there 
are an infinite number of distributions on the product with the given 
marginals. In other words* the distributions of the individual genes 
by no means determine the distribution of genotypes. One needs some 
additional information describing the linkage of the genes. In the 
two-locus-two-allele case a is 3 dimensional and a^ x A^ is 2 dimen¬ 
sional. So one parameter is sufficient to describe the degree of 
linkage. This parameter is sometimes called the coefficient of link¬ 
age disequilibrium. We now describe the generalization of this 
parameter and its relatives to the multilocus case. 

If i*j g I and S is a subset of L = {l*...*£}* we introduced 
the notation i*j to stand for i = igj-g and j = jgi-g ( s i- s t ^ ie com “ 
plement L - S). The pair of gametes i and j are obtained from the 
pair i and j by exchange of genetic material at exactly the loci 
in the set S. Now define: 


(5.1) 


d.. = p.p. - PtPt 

ID id ID 


(5.2) L.. = An P-P./PtPt = in p.p. - in PtPt 

ID id 1 D id id 


in p. - in p? - in p- + in p.. 
i i j d 


S s o 

Note that d. . is defined for p e a but L. . is only defined for p g A- 
ID ID 


1 Proposition : For any fixed S c L and p e A d_ = 0 for all i and 
j in I if and only if the loci of S and the loci of S are 


independent with respect to p* that is: 
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(5.3) 


s s 

P.^ = P (ig)'P (i-g) for all i e I. 


Furthermore, if p e A these conditions are equivalent to: = 0 for 

all i and j in I. 


Proof: d. . = 0 if and only if p.p. = PtPt and if p e a this is 

- ID 7 i j 

S 

true if and only if An p.p. = An p-rp-r or L. . = 0. If (5.3) then 

ID i j id 

PjPj = p S (fg)p S (j s )p S ( i g)p S (j'g) = P^Pj* 0n the other hand if 
P^Pj = PtPt f° r a H i an ^ D we can sum on j. On the left we get 
£ P^Pj = Pj_ S Pj = p^. On the right we can think of j as two inde¬ 
pendent variables j in I and j~ in I~. Summing pt over the j— 

S S S S 1 s 

g 

variables we get p (i^) by (0.2). Summing p^ over the j g variables 


we get p (i~) . Hence (5.3). 


QED 


2 Corollary: For p e k, d^ = 0 for all i,j e I and all S c L if and 
only if p lies in the Wright manifold A, i.e. equation (2.8) holds. 

o S 

For p e L j_j = 0 for a11 i* j e I and all S c L if and only if p 

o o 

lies in A = A D 

Proof : (5.3) holds for all S. This means each locus is independent 

of the rest with respect to p. So p is a product distribution. QED 


Now fix i,j e I and S c L and let p vary over L 

becomes a real-valued function on A. In the notation of Table 4 of 

the previous section L S . is of the restriction to A of a function 

ID 

b - - 

L (x) where b^ is 0 except for k = i,j and k = i,j where it is 1 

and -1 respectively. Each of the coordinate functions of E mapping 


to n a F a is the restriction to A of a map E (x)(k) with a some 
locus and k some gene at the a locus. By (0.1) these functions 
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are all of the form E a (x) in Table 4. 

5 

The different functions are not independent of one another. 

For example, the simplest relations among them are: 


(5.4) 


L S . = L S . = -L?.. 

ID ID 


There are more subtle linear relations as well. By a direct argument, 

Shahshahani showed in [28] that there are d = n - E n + £ - 1 inde- 

a a 

g 

pendent functions L^. Putting them together we get a function 
° d 

L: A -» R . This number is exactly the number of dimensions left over 
after we constrain by the map E. This motivates the following 
theorem of Chapter II. 

3 Theorem : The map E x L: A (^A^) X R d is a diffeomorphism, that 


is, it is one-to-one and onto with a smooth inverse map. Furthermore 

a b 

if E is any of the £ n^ coordinate functions of E and L is any 

5 

function (for example, the coordinate functions of L) then the 

gradients sjE a and are everywhere orthogonal with respect to the 


Shahshahani metric. 


Sketch of the Proof : The orthogonality relations are consequences of 
(4.13). From them it is not heard to show that the derivative of E xL 
is a linear isomorphism at each point. The inverse function theorem 
(Thm. 4.3) then implies that E x L is locally a diffeomorphism. This 
means that if the inverse exists it is smooth. A final topological 
argument shows that the function is globally one-to-one and onto. QED 

A foliation of a manifold is a way of cutting the manifold up 
into infinite disjoint family of lower dimensional submanifolds, called 
the leaves of the foliation. For example, by choosing a plane in 
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we can foliate R^ by the family of all planes parallel to the given 
one. By bending this picture you see a general picture of what a two 
dimensional foliation of a three dimensional manifold looks like (at 

least locally). From the functions E and L we get two foliations 

0 

of A originally defined by Shahshahani. 

The foliation of fibres $ consists of the manifolds defined 

implicitly by a choice of marginal distributions, i.e. a typical leaf 

-1 . a, f c ®a 

is of the form E ((p ]) for (p J a fixed element of n A . Since E 

is linear the leaves of ® are parallel convex sets in A. 

The transverse foliation ? consists of the manifolds defined 


implicitly by a choice of values for L. A typical leaf is of the 

_ i ^ 

form L (u) where u is a fixed vector in R . Since the functions 

5 

L.. are nonlinear these leaves are curved. 

Every point p of ^ is the intersection of a unique leaf 

® of 3D (namely, E ^(E(p))) and a unique leaf ~T of T (namely, 

P P 

L ^(L (p))). £) consists of all (interior) distributions having the 

P 

same marginals as p. T consists of all distributions having the 

5 

same linkage numbers as p. In particular. Cor. 2 implies that 

O 0 — O — 1 

the Wright manifold in Aj> A, is a leaf of T. In fact, A = L (0) . 

While we have defined the leaves implicitly, each one can be 


described explicitly. On each leaf of ?, L is a constant and so 

_ 9 

E restricts to a diffeomorphism of J with n A . So the inverse 

p a a 

function (E|j“ ) if you could compute it gives an explicit coordi- 
P 

— o O 

natization of J by n A . For A this function is given by the for- 
p 2 a a ^ 2 

mula (2.8). Similarly, (L|® ) 1 coordinatizes the leaf 5) by R d . 

P P 

However, I am unable to actually compute these inverse functions. 


The tangent space of a foliation at a point means the tangent 
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space of the leaf through the point. So we write T^E> or T^J* and mean 

T ($ ) or T (7 ). We now state the result which relates this geometry 
p p p P 

to epistasis. 

4 Theorem : (a) For p e A the tangent spaces and T^J* give a per¬ 

pendicular decomposition of T^A with respect to the Shahshahani metric 

( , ) , i.e. the two subspaces are orthogonal and every vector X in 

P 

• _ 

T A can be written uniquely as the sum X = X n + X^ with X _ e T 5) and 
p d t a p 

X e T 
t P 

— — S 

(b) T 5) is spanned by the vectors of the form v L. . . 

P P ID 

o _ 

(c) X = Z p.g.d. € T A lies in T 7 if and only if the rela- 

111 p p 

tive component vector § has no epistasis, i.e. f can be written in 
the form (2.5). 

This theorem interprets the first least squares approximations 
of Sec 2 geometrically. First, since X^ is perpendicular to X^, Thm. 

3.2 says that X, is the projection of X on T ? orthogonal with 

t p 

respect to ( , )^. Now if X = X P^^d^ and x t = £ Prop. 

4.3 and (c) above imply that § is the zero-epistasis approximation 

to 7] with respect to the covariance inner product ( , ) . The full 

P 

power of these results is not so much in their interpretation of the 
static picture with p fixed as in their application to the dynamics 
of changing p. 

5 Corollary : Let X(p): X X^(p)d^ = X P^l^(p)d^ be a vectorfield on 
A with X^(p) and f^(p)'s smooth functions of p. The following con¬ 
ditions are equivalent: 

(Ta) X is everywhere tangent to J, i.e. X(p) e T^J*. 

(Jb) The leaves of J* are invariant manifolds for the differ- 
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ential equation defined by X. 

_ g 

(J*c) For all i,j and S the functions L^_. are conserved 
(remain constant) by the differential equation defined by X. 

— CL 0 

(jd) There exist E n smooth function cp. : A -> F (i el) 

a 

such that the n n functions §.:A- > F(ieI = ni) can be written 
CL 1 CL CL 

| . = E cp a . 
cri 

a 

The following conditions are equivalent: 

(5>a) X is everywhere tangent to 5), i.e. X(p) e T 

P 

(S)b) The leaves of 5) are invariant manifolds for the differ¬ 
ential equation defined by X. 

- a 

(Sc) The marginal distributions p. are conserved by the 

a 

differential equation defined by X. 

Since frequency-dependent selection is just some vectorfield 

X on A the condition (^a) - (jd) explain what zero-epistasis means 

S 

for frequency-dependent selection. Since the ' s are conserved 

g 

their antilogs p^p^/p^-p-T, denoted , are also conserved. However, 

S o 

their relatives d. . = p.p. - p-rp- are not conserved except on A 
ID i D i D 

S S 

where all of the d. .'s and all of the L. .'s vanish. 

ID iD 

Now apply Prop. 3.3: 

> o 

6 Corollary : Let f: A -* F be a smooth function. The gradient field 

7 f is everywhere tangent to i.e. v f e T r for all p e A* if and 

P P 

only if f is constant on the leaves of S. These conditions hold 

if and only if f depends only on the marginal distributions, i.e. 

there exists a smooth function f • n A -> F such that f is the 

0 cl a 

composition f^oE. 


Similarly, yf is everywhere tangent to if and only if f 
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depends only on the L,. 1 s. 

ij 

These results all generalize to higher levels of epistasis. 

Let K be a fixed complex of loci as defined in Sec. 2. We define E 

S 

to be the restriction to A of the product of the maps E for S e K 

S S 

where E is defined by (0.2). So when we know E (p) we not only know 

the probabilities of the various genes but also the linkage among 

S 

loci in S where S is any bloc in K. E maps A linearly onto 

some convex set denoted A__ and maps A onto its interior A„. Again 

K K 

the component functions are of the form E a (p) and there exist func¬ 
tions of the form L^Cp) which play the role of the L^'s. These 

functions are computable but they get pretty messy. Choosing a maxi- 

K o cl (K) 

mal independent set we define a function L : A -> R . Again 
K K e ° d (K) 

E XLrA^A^xP is a dif feomorphism. The gradients of the E 

K 

and L coordinate functions are orthogonal. So we get two orthogonal 


—K 

foliations ® and 


Most important, X = E G T^A lies in 

V* if and only if the relative component vector § has K type 

epistasis. Apply these results with K equal to increasing skeleta 

of L: L^^etc. We get the geometrical interpretation of the 

partition of variance in Sec 2. As K increases the leaves of 

—K 

get thicker (increase in dimension) and the leaves of 3D get thinner 
to compensate. 

The analogue of Cor. 5 also holds as does: 


7 Corollary : Let f: A -» R be a smooth function. The following are 
equivalent: 

_ —K 

(a) The gradient field vf is everywhere tangent to T . 

(b) The components of the usual gradient: have K type 

dX i 


epistasis at every point p of A. 
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_K 

(c) f is constant on the leaves of 5) . 

S 

(d) f depends only on the partial distributions p e A g for 

o K 

S e K, i.e. there exists f : A__ -> F such that f = f*E . 

K K K 


6. The Selection Field 


Mean fitness is the function m: A -> F defined by m = E p.p.m.. 

i D ID 

where is the fitness or Malthusian parameter of zygotic genotype 

ij. m extends to the quadratic function Z x^x^m^ on F*. Using 

o 

this extension a direct computation using (4.12) shows that the A 
gradient of ~ m with respect to the Shahshahani metric is given by: 


( 6 . 1 ) 


Vi 




m) d. 

l 


P e a- 


Here m^ = E Pj m ^j fh e average fitness of gametic genotype i. 

The differential equation associated with this vectorfield on 

o - 1 - 

A is (1.4). So we call u (~ m) the selection field on A. The 
observation that selection field is the gradient of mean fitness 
(times 1/2) is Shahshahani's instant proof of the Kimura maximum 
principal. 

For Fisher's fundamental theorem let a.. be any metric trait 

ID 

depending on the zygotic genotype. a = E P^Pj a ^j -*- s the mean when 
the gamete distribution is at p e A, and a^ = E p^a_^ is the mean or 
average value of the trait when one gemete in the zygote is i. The 
change in a as p changes due to the selection field is given by: 


(6.2) 4 f| = da (7 (~ m) ) = (v = 2 

dt'p p p 2 p 2 P P 


'yT P j (“>£ - “») (a ± 


a) 


= 2 Cov (m,,a.) , 

P 1 1 
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In detail, if p(t) is a solution curve of (1.4) then the derivative of 

a(p(t)) with respect to t is equal to the derivative of the function 

a applied to the tangent vector p'(t). This is the chain rule. But 

by (3.25) the tangent vector p'(t) is just the vectorfield of the 

differential equation, in this case v (“ m), at p(t). The remaining 

equations follow from the definition of gradient, (3.26), and the 

computation of ya just like (6.1). In sum, the rate of change of the 

mean of a metric trait under selection alone is given by the genic 

covariance of the trait with fitness. When a.. = m.. we have: 

ID ID 

(6 - 3) I?I P = 2|l Vi 5),, p = 2 “ S)2 = 2 Var P (B i ) : 

This is positive except where the vectorfield itself vanishes, i.e. 
except at equilibria. 

The expression on the right of (6.3) is usually called the 
additive variance of fitness. 

Suppose there is only one locus with n alleles. We say that 

fitness is additive or dominance-free if m.. is the sum of a contri- 
- ID 

bution from each gamete, i.e. 

(6.4) m.. = k. + k. . 

ID i D 

Notice that in Sec. 2 we used additivity to mean the absence of inter¬ 
action between loci in the expression of some gametic trait, i.e. no 
epistasis. Here we refer to the absence of interaction between homo¬ 
logous genes in the expression of a zygotic trait, i.e. no dominance. 
If (6.4) holds then letting k = £ p_^k_^, we have: 
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(6.5) 


m = 2 k 


m. = k. + k 
i i 


m = k. - k. 
i 


If m.. does have dominance then with p e A we can define the 
ID 

best dominance-free approximation to m_^ to be nu + rru - m. Note 

that these values depend on p which we regard as fixed. So if we 

let 0.. =m.. -m. -m. + m, we have 
ID ID i D 


(6.6) m.. = m + (m. - m) + (m. - m) + 0.. (p fixed in A). 
ID i 3 id 


This fits into the linear algebra framework of Thm. 3.2. 

fp^Pj} defines a product distribution on the set of (ordered) zygotic 

zenotypes I x I. m_^ + m_^ - m is the projection of the vector on 

the set of dominance-free vectors. The projection is orthogonal with 

respect to the inner product analogous to ( , ) i.e. 

P 

(m..,n..) =2p.p.m.,n... This orthogonality means the following 
ID ID i D ID ID 

formula which holds because 0. = Z p.0.. =0 for all i: 

i D ID 


(6.7) 


p.p.(a. + b.)0.. = 0(a,b e F ). 
i . 3 i j ij 


As a consequence of this orthogonality, if we define the zygotic 
variance of fitness: 


( 6 . 8 ) 


v z = 


= (m ^ - m)2 


D ID 


Then is the sum of the variance of the dominance-free approximation 


and the variance of the error term 0... The first is called the 

ID 

additive variance and the second is called the dominance variance. 


Since 0 = 0, we have: 
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V = V + V 
Z A D 


V A = V P i (m i " m) 


.2 

p. p. e. .. 
ID ID 


Thus, the additive part of the zygotic variance of fitness is twice 
the gametic variance. 

The simplicity of these formulae comes from the fact that the 
zygotic distribution is a product. So the Hardy-Weinberg condition 
here plays the role that linkage equilibrium plays in problems of 
epistasis. When the Hardy-Weinberg condition does not hold the 
formulae for the dominance-free approximation becomes more complicated 
(see [6, Sec. 4.1]). 

Now suppose that m_^ was dominance-free to begin with, i.e. 

(6.4) holds. Then ~ m = k is a linear function. If all of the k_^'s 
are distinct then k has a unique maximum on A namely fixation at 
the allele with the largest k_^, and so the selection field vk tends to 
that vertex in A. In this case, however, we can explicitly solve 
system (1.4): 


( 6 . 10 ) 


d Xn(p i /p ) 


= k. - k. 
i D 


i, j € I. 


Since k_^ - kj is a constant, we get that P^/Pj increases exponentially 

at rate k^ - k_. . So the vector p(t) = {p^(t)} is proportional to 
k. t 

(p.(0)e 1 } or: 


k. t k j t 

P i (t) = p i (0)e 1 /t 2 Pj(°) e 1- 


( 6 . 11 ) 
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This case illustrates a difficulty with the Shahshahani metric. 

It is only defined for the interior distributions A* Often however 

6 

the boundary A - A is exactly where we are looking, in problems about 
fixation of one allele, for example. Theorem 4.1 provides a way 
around this limitation: 


1 Theorem: Let f: 


_S 2 -> A by f(z) = p with p^ = z ^/2 be the map of 

2 2 

Thm. 4.1. Define M: S_ -> F by M(z) = £ z.z.m... With the usual 

2 ^ i 3 13 

metric on S 2 the gradient vectorfield v(H/ 8 ) defines a differential 
equation on S 2 and f maps the solutions of this differential equa¬ 


tion to solutions of the selection equation (1.4) on A. 


Proof : On f! P f is an isometry and so its derivative relates 

the gradient of M /8 with 7 (~ m) on A. By continuity the derivative 
relates 7 (M/ 8 ) on the rest of fl P with equation (1.4) on the rest 
of A. The extension to all of S 2 has to do with the invariance of 
M under change of sign of the coordinates z^. This is a technical 
argument which we will sketch below. If two vectorfields are related 
by the derivative of f they are called f related and it then 
follows from the chain rule that f maps solutions of one to solu¬ 
tions of the other. QED 


This result is a special case of a general method of changing 
problems at the boundary of A to problems with symmetry on S 2 . 

Here I have to become technical. The group (Z 2 ) n acts on S 2 by 
e.(z_....,z ) = (z n , ....-zz ). The quotient of S 0 under this 
action is A and f: S 2 -> A is essentially the quotient map. The 
general form of a vectorfield on A which is parallel to the faces 
is X (p) = 2 p i ?^(p)d^ with p^5 ^(p) = 0. This is the general form of 
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frequency dependent selection and the parallelism condition means 
that new types of gametes cannot be produced by selection above. In 
contrast, recombination and mutation do produce new types of gametes 
and don't satisfy the parallel condition. Such a vectorfield is f 
related to a unique vectorfield Y(z) = Z z^T|^(z)d^ on which Y is 
invariant under the group action. Conversely such a vectorfield Y 
folds up to get a unique vectorfield X on A parallel to the faces. 
Finally, since the functions 1\ are invariant under the group action 
it is an exercise in singularity theory [3, p. 64, exercise 1] to show 
that 7|^ depends on the squares of the z^'s. So the study of frequency 
dependent selection on ^ is equivalent to the study of vectorfields 
on S^ invariant under the action of . However, we won't pursue 

these boundary problems further here. 


Since the selection field is a gradient Cors. 5.6 and 5.7 
apply to it. The result is a bit sharper in the completely symmetric 


case. Recall from S 


2 that m.. is 
ID 


completely symmetric if 


m. . = m-r- for all i, j e I and S c L. 
ID ID 
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gene) distributions p a . 

(e) Fitness is the sum of contributions from the separate 

loci, i.e. for a = 1 ,..., l there exist symmetric, real vectors 
I xl 

a _ a a , ., . 

m e R such that 

(6.12) m.. = ^ m a (i a ,j o ). 

a 

3 Corollary ; If the selection field has no epistasis then the Wright 
manifold of distributions in linkage equilibrium is an invariant mani¬ 
fold for selection. 

The above result generalizes to define K-type epistasis by: 

(a) V Cl m) leaves f invariant, (b) m at p depends on the par- 
2 K 

S 

tial distributions P for S in K i.e. mean fitness depends only on 

the joint distribution on blocs of loci in K, and (c) m.. can be 
j 13 

5 

written as the sum of terms m (i g ,j g ) for S in K, terms each 

depending only on the genes in a bloc of K. 

For any K there is a particular leaf A of j which is anal- 

K K 

o _ 

ogous to the Wright manifold A in j. For any leaf of the fibre 

foliation ® the point of intersection with h is the distribution of 
K K 

greatest randomness in the leaf. If every locus of L is in some 

K 

0 

bloc of K then A contains A. While in the presence of epistasis 
K 

e 

A is not invariant,if the epistasis is K type then A v is invariant. 

K 

The most important special case of K is the disjoint bloc 

model K = T. V...V T,. of Sec. 2. Here A T , consists of the distri- 
1 V K 

butions satisfying (5.3) for S = T^,T^,...,T^,. By an argument 
analogous to Cor. 5.2, p e A K for K = T ^ V...V T if and only if p 

is the product distributions from the factors I I : 

V * ' * ’ r 
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(6.13) 


T T 

P, = P 1 (i T ) ••• P 1 (i T ) 

1 V 


7. The Recombination Field. 


Following equation (1.6) we define the recombination field 


associated to S c L by: 


S 's. . S _ S _ 

R = > (r..b..p.p. - rT-b T -pTpT) 8 . . 

/ ID ID I D ID ID I D i 


Recall that i and j are i j~ and j i~ respectively. The differ- 

o o o o 

ential equation (1.6) comes from the vectorfield -R where 


= ^ (R S = 


S c L) . 


Shahshahani observed that the recombination fields are all 
"vertical" vectorfields, that is, they are everywhere tangent to the 
fibre foliation ® of Sec. 5. To see this, compute the gradient of 

S 

defined by (5.2). Using Table 4 of Sec. 4 we get: 


V L . . = 8. - d — - 8 t + 8 . . 
ID i i D D 


Now fix i,j and S and collect the eight terms in the formula for 

g _ S 

R which involve the pairs ij and ij. We discover that r..b..p.p. 

2.J id i d 

S 

occurs as the coefficient of 5,, 8 . , -St and while r-r-rb-r-p-rp- 

i D i D !D ID ID 

occurs as the coefficient of ^i anc ^ ^j* So we 9 et: 


, S , S , S 

(r. .b. .p.p. - r-r-rb-TTp-rpT) vL. .. 
ID ID ID ID ID ID ID 


The factor of 1/4 is to compensate for the four identical terms here 
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associated with the pairs ij, ji, ij and ji. 

5 

Since the recombination fields R and R are everywhere linear 

_ g 

combinations of j, Thm. 5.4 (b) implies that these fields are every¬ 
where tangent to $, i.e. satisfy (S)a) - (Sc) of Cor. 5.5. This just 
says that the gene frequencies at each locus are left unchanged by 
recombination. 

As we saw in equation (1.7) the form of the recombination 

g 

fields is simplified when and b^ are completely symmetric. In 

that case we have (cf. (5.1)): 


(7.5) 


= \ r?.b..(p.p. - p-p-)d. 

/_ 1 D il il il i 

i, j 

= (1/4) r S .b..d S . vL S .. 

Z _ 11 11 11 11 


1,1 


= (1/4) \ r S .b..(p.p. - ptP-t)vL S .. 

x il il i j i l il 

i, j 


i,l 

s s 

As Prop. 5.1 indicates d_^ and L_^ are measuring essentially 

S S 

the same thing. Notice that if we replace by in (7.5) we 

get constant linear combinations of terms like: 


(7.6) 


S -S -.,1. T S ,2. 
L ij VL ij = v( i (L ij ) > 


So the resulting vectorfield is a gradient field. However, d^ is not 
S S 

a function of the ^'s above and so R is not a gradient field even 
in the completely symmetric case. We will see in Sec. 9 that this 
has profound consequences for the dynamics of selection plus recombi¬ 


Despite the fact that the fields R are not gradients, there 


nation. 
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does exist a global Lyapunov function for the recombination field -R on 
A. It is entropy H: 


(7.7) 


H(p) 


--L 


in 


Pi- 


More precisely, we normalize H by subtracting the sum of the 
entropies of the gene frequencies: 


(7.8) 


H(p) = H (p) - ^ H(p“). 


Recall that p is the marginal distribution induced by p on the 


alleles I at the a locus, 
a 


H turns out to be everywhere nonpositive on a, vanishing 

o 

precisely on the Wright manifold A. 


1 Theorem : Assume r^ and b^ are completely symmetric. Let p(t) be 
a path in A associated to the recombination vectorfield - R. That 
is, p(t) is a solution of the recombination equation (1.7). 


(7.9) 


dH. - - 

— = (-R,y H) ^ 0 

dt 1 p P P 


for all p e A- 


Furthermore, (-R,v H) vanishes at p if and only if the vectorfield 
P P 

R is zero at p, i.e. p is an equilibrium for equation (1.7). R 
vanishes on the Wright manifold and vH vanishes exactly on the Wright 
manifold. 

We prove this theorem in Chap. Ill together with a related 
result for certain kinds of position effects. 

One point left open in the theorem is the question whether R 


can vanish off the Wright manifold, i.e. are there any equilibria 
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other than distributions satisfying (2.8). Assuming that > 0 

for all i,j and S R vanishes precisely on A. However, if too 
many b..'s are 0, aberrant equilibria can occur. I suspect that they 
are not biologically significant. If too many zygote types are 
sterile I conjecture that the population either becomes extinct or 
else eliminates certain gamete types thus throwing the dynamics onto 
some lower dimensional "face" of A on which only the usual linkage 
equilibria occur. This Sterility Conjecture is also described in 
Chap. III. 


In relating recombination with selection, I originally hoped 


that for each complex of loci K, the recombination field would be 

tangent to the distinguished leaf A of the transverse foliation j* . 

K K 

This would then imply that if the selection field had K-type epistasis, 

A would be an invariant manifold for selection plus recombination. 

K 

However, this tangency condition is usually not true even when K is 
a disjoint bloc model T^ V...V T^,. At least in the disjoint bloc 
case this is a shocking result. 

e 

In that case A consists of distributions p e A satisfying 
K 

(6.13) meaning that the loci on different blocs may be independent. 
There is an example where it represents a considerable violation of 


biological intuition. We will examine this case in some detail since 

at first glance it casts doubt on the validity of the entire model. 

Consider the case when there are V chromosomes in the haploid 

th 

gamete and the bloc T^ consists of the loci on the a chromosome 

(a = l,...,j£'). The distributions in A are precisely those where 

K 

the loci on different chromosomes are independently distributed. Now 
it is not surprising that selection might lead to "linkage" between 



62 


genes on different chromosomes, but we will see in Chap. Ill that 
even if the birth rates, death rates and recombination rates are 
genotype independent, i.e. there is no selection, R need not be 
tangent to A . So recombination alone may destroy the independence 
between chromosomes. But biological intuition going back to Mendel's 
Law of Independent Assortment makes it hard to see how such indepen¬ 
dence could be destroyed at all and especially by recombination which 
tends to make all the loci independent. Perhaps the model is wrong 
and a more accurate model would yield tangency in the disjoint bloc 
case. But no, the model is right. 

The explanation is simpler for a discrete time model. Of these 
there are two types: In the first kind, the annuals, this year's 
crop all die but are replaced by their offspring. In the second kind, 
the perennials, some of this year's crop die and the rest survive to 
join the offspring in forming next year's crop. If the model ignores 
the age structure of the population then the survivors and the young 
are regarded as indistinguishable members of the new crop. In that 
case the response to selection alone is essentially the same in the 
two kinds of model. All that matters are the net reproductive rates 
(= birth rates minus death rates). In the biological literature 
there is a large body of work about life-history tactics much of 
which has grown out of Cole's observation [5] that the switch from an 
annual strategy to a perennial strategy (in fact to immortality) is 
mathematically equivalent to increasing the birth rate by one. But 
once one introduces recombination the two models are quite different, 
for the offspring have been exposed to the effects of recombination 
while the survivors have not. The new crop is thus a mixture of two 
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populations. It is an odd fact that two characters can be mathemati¬ 
cally independent in each of two populations and yet not be indepen¬ 
dent in the combined population. 

For a concrete example, consider a four locus, two allele model 
with two loci on each chromosome. Imagine a large population with 
very small per generation birth and death rates (and these are inde¬ 
pendent of the genotypes). Suppose that the population begins with 
A, B and a, b totally linked and in equal numbers on the first 
chromosome and with C, D and c, d totally linked in equal numbers on 
the second chromosome. Finally, assume that the two chromosomes are 
independently distributed. Thus, out of the 16 possible gamete-types 
the parental generation was constructed out of the four possibilities. 
AB or ab on the first chromosome and CD or cd on the second. The 
probability of each of the four types is 1/4. Thus, the parental 
generation is far from linkage equilibrium. A, but is on A for the 
disjoint bloc model with the two blocs equal to the two chromosomes. 
Now look one year later supposing recombination rates r^ and r^ with¬ 
in the two chromosomes. Among the offspring all 16 gamete-types 
appear and the offspring distribution is closer to A (though not on 
A unless r^ = r^ = 1/2). Furthermore, since the chromosomes assort 

independently the offspring distribution is on h . Meanwhile, the 

K 

distribution of the survivors is the same as the parental distribu- 
tion, since the death-rate is genotype independent, and so it is 
still on a t • However, for the entire population of survivors-plus- 
offspring the two chromosomes are not independent. For choose a game¬ 
te and suppose we discover Ab on the first chromosome. Then the 
gamete must come from the offspring population and the probability 
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that Cd is on the second chromsome is ^/^r. On the other hand if we 
discover AB on the first chromosome then since the number of survivors 
is very large and the number of offspring is very small, it is almost 
certain that the gamete comes from the survivor population and so the 
probability that Cd is on the second chromosome is essentially 0. 
Since the state of the first chromosome provides information about the 
state of the second, the two chromosomes are not independent. 

The vectorfield model is like a discrete model of the second 
kind and the constant mixing of survivor and offspring distributions 
is the reason that tangency fails. 


8. The Mutation Field. 


From equation (1.8) we define the mutation field: 


( 8 . 1 ) 


■ - 2>i 


N. . d . 
ID D 


Here N.. is defined to be the matrix: 
ID 


( 8 . 2 ) 


N. . = 
ID 


< n ij 


^ -n 


i* 



i ^ j 
i = j- 


n. is the relative rate at which i gametes are transformed to j 

gametes by mutation (i ^ j). Unlike the recombination and selection 

fields, N is a linear vectorfield. The coefficient of S, is a linear 

D 

function of p. Furthermore, N.. = n.. > 0 when i ^ j. We make the 

ID ID 

reasonable assumption that any gamete j can be obtained from any 
other gamete i by some finite sequence of mutational steps. Under 
this assumption, the theory of positive matrices due to Frobenius 
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enables us to prove—in Chap. Ill—that the vectorfield N vanishes 
at a unique point q in A. q lies in A and can be directly 
computed. Up to scalar multiple—so that £ q^ = 1—q^ is the i^ 
principal minor of N (= the determinant of the n-1 x n-1 matrix 
obtained by deleting the i^ row and the i^ column from N) . This 
equilibrium is globally asymptotically stable meaning that every solu¬ 
tion path of (1.8) approaches q as time t tends to oo. Further¬ 
more, we can estimate the eigenvalues of N to get an upper bound on 
the rate at which the equilibrium is approached, namely no faster 

than twice the total mutation rate max.n.As the mutation rates 

i i* 

are usually quite small the mutation equilibrium is approached in a 
rather leisurely fashion as one would expect. 

Now suppose that mutation between gametes is due to the inde¬ 
pendent occurrence of mutations at each locus. At the a locus, 

CL 

let n. . be the relative rate by which i alleles are transformed 
in a 

a a 

to j alleles by mutation when i ^ j . We assume that the rate at 
J a 2 a ^ J a 

each locus is independent of the allelic values at the remaining loci. 

-CL 

We then get a matrix N and an associated linear vectorfield on A^ 

. . -a 

which we will also denote by N . 

On A the vectorfield representing mutation at the a locus 

a 

is the linear vectorfield N associated with the matrix defined by 


(8.3) 


a -a 

N.. = N. . 6. . 

11 11 i~i~ 

J a a a a 


Here 6 is the Kronecker delta and i~ is the projection of i to 

the set of loci complementary to a, or to a = L - {a}. This means 

a 

that i mutates to j at relative rate n. . provided that i and 

1 a^a 

j agree at all loci other than the a locus and at that locus they 
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are different. The vectorfield N representing the effect of muta- 

a 

tion at all loci is the sum N = £ N . It is the linear vectorfield 

a 

associated with the sum of the corresponding matrices. 

When N has this special form the Wright manifold A is an 

invariant manifold for N and the equilibrium q lies in A. In 

' . a 

fact, q is the product distribution whose marginal distribution q 
. 0 —CL 

is the equilibrium in A^ for N . The rate of approach to q is at 

a 

most the minimum of the corresponding rates for the q 1 s, expressing 
the "bottleneck" effect whereby the rate of a complex process is 
determined by the rate of the slowest component. 

Notice that in defining N we did not include any terms repre¬ 
senting simultaneous mutation at several loci. Why don't such terms 
appear? 

The reason goes back to the definition of mutation rate. 

Beginning with i e I, the probability that in a time interval of length 

dt i will mutate to j which agrees with i at all loci but a 
ct 

is p.n. . dt, or more precisely it is this plus an error term of the 
1 1 ar , a 

form o(dt) meaning Lim o(dt)/dt = 0 as dt approaches 0. Similarly 
the probability that i will mutate to k which agrees with i at 

Q 

all loci but p / a is p.n? dt + o(dt). Now if i agrees with j 

1 ^ P 

at a, with k at p and i everywhere else then the probability 

the i will mutate to j i directly in the dt interval is the product, 

2 ct 6 2 2 

p.n. . n? , dt + o(dt ), assuming that mutations at different loci 
*ii 3 i»k_ ^ 

era p £ 

are independent. This entire term is o(dt) and so the instantaneous 


rate of such simultaneous mutations is zero. 
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9. The Combined Field . 

By the combined field we mean the sum of the effects of selec¬ 
tion, recombination and mutation: 

(9. 1) 7 m) - R + N. 

Ideally, we would like to solve explicitly the differential equation 
associated to this field. As this is impossible we would at least 
like to give a portrait of how the solutions behave, a "qualitative" 
description. We are particularly interested in the stable elements of 
behavior. There are two—quite different— concepts of stability for 
a vectorfield on a manifold. 

Stability of a particular solution under perturbation of initial 
conditions is called orbital stability . For example, an equilibrium 
is called stable if for initial conditions near it the solutions tend 
to return to the equilibrium or at least remain close to it. The 
solution paths which are not orbitally stable are called separatrices . 
Removing the separatrices cuts the manifold up into a number of in¬ 
variant open sets or domains of attraction . All of the solutions in 
a domain behave similarly. For example, they may all tend to a par¬ 
ticular equilibrium or limit cycle. Crossing a separatrix to a 
different domain changes the behavior, e.g. the solutions may now 
approach a different equilibrium. The location of the separatrices 
and description of the domains of attraction constitute the phase 
portrait of the field. 

Stability of the entire equation under perturbation of para¬ 
meters is called structural stability . If a vectorfield is not 
structurally stable then an arbitrarily small change in some 
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coefficients or addition of some other small field (a perturbing term) 
may completely change the phase portrait of the equation. If the 
equation depends on some continuous parameter then the set of values 
of the parameter at which the equation is not structurally stable is 
called the bifurcation set . Removing the bifurcation set cuts the 
parameter space into a number of open sets or regimes . All of the 
equations with parameters in a regime have similar phase portraits. 
Crossing the bifurcation set to a different regime causes a change 
in phase portrait which Thom [34] calls a catastrophe . 

The philosophical similarities between these two kinds of 
stability is discussed by Abraham in [1]. In practice some under¬ 
standing of the qualitative behavior of an equation is needed even 
when doing numerical solutions. It is necessary to know that the 
finite set of points that one plots give a reasonable picture of the 
actual solution and that the solution itself is typical, i.e. is not 
an artifact due to special choice of initial values or coefficients. 

Some of the difficulties that arise in studying the combined 
field come from the fact that recombination is never, and multi-locus 
selection is rarely, structurally stable when considered alone. 

In Sec. 7, we saw that the recombination field -R is tangent 
to the foliation 3). A point moving along the recombination field 
remains in a fixed leaf, i.e. preserves the gene frequencies at each 
locus. It moves down the leaf toward linkage equilibrium approaching 
the unique point of the Wright manifold A in the original leaf. The 
tangency to a foliation and the occurrence of a whole submanifold of 
equilibria are structurally unstable characteristics. However, by 
invariant manifold theory (cf. [14]) the existence of the invariant 
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manifold itself is structurally stable. This means that if -R is a 
perturbation of -R, there exists a submanifold A close to A which 
is invariant with respect to -R. If a point begins far from A and 
moves according to -R it rapidly approaches the invariant manifold 
A with a slow change of gene frequencies at each locus. So in the 
beginning it acts like a solution of -R. But near A it begins to 
move along A in a fashion completely determined by the particular 
perturbation, i.e. completely unpredictable from -R alone. This slow 
transverse motion which was invisible at first when -R was large 
becomes manifest near A because there -R itself is near zero. 

For selection, suppose the maximum value of mean fitness m 


occurs at a point in A. If this point is an isolated maximum then 
the gradient 7 m) is structurally stable. The point is an asymp¬ 
totically stable equilibrium and a perturbation of the selection 
field has a unique asymptotically stable equilibrium nearby. This 
picture is reasonable in a single locus model. In a multilocus model 
it corresponds to complete epistasis. At the other extreme suppose 
that m^j has zero epistasis. Then by Thm. 6.2 m is constant on 
the leaves of 3D. So the maximum value occurs on an entire leaf of 
©. Furthermore 7 (-^ m) is everywhere tangent to the leaves of T and 
so a point moves along a fixed leaf of J toward the maximum of m 
on the leaf. For K type epistasis, 7 (~ m) is tangent to the folia¬ 
tion T and m is constant on leaves of ® . So here again we have 
K K 

an invariant foliation and a submanifold of equilibria. Just as be¬ 
fore this portrait is not structurally stable. 

The mutation field N is structurally stable, but the coef¬ 
ficients are small enough that it is best considered just a perturba- 
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tion term added to 7 (•“ m) - R. 

In the zero-epistasis case the behavior of the combined field 
can be described completely. In that case the motion due to -R of a 
point towards linkage equilibrium is never opposed by selection while 
the latter moves the point toward the maximum of mean fitness. The 
two motions are perpendicular. In particular. 


(9.2) 


f| = (v m) - R,v m) 

dt p p 2 P P 


= (V p (^ m),V p m) p - 0 = V A > 0. 


So mean fitness increases monotonically. This is Ewens 1 extension 
[9] of Fisher's fundamental theorem to multilocus models when there 
is no epistasis. 

The Wright manifold of linkage equilibria is an invariant mani- 

• • 

fold for the combined field. Using the diffeomorphism E: A -> I^A^ we 

can think of selection and mutation acting on A to affect the gene 

frequencies at each locus. Writing m^j as the sum (6.12) and putting 

the analogue of the Shahshahani metric on each A^, the motion on A 

due to the combined field is separated via E into i non-interact- 
. - 1 a. -a a 

mg fields: 7 (~ m ) + N on A^. If each m has an isolated maximum 

in A and the mutation rates are small then these fields are struc- 
a 

turally stable. 

So in the zero epistasis case the combined field is usually 
structurally stable. Recombination tends to move the state to 
linkage equilibrium and selection plus mutation act to equilibrate 
the gene frequencies at each locus. Which of these two motions is 
faster depends upon the relative strengths of recombination and 
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selection. The two cases give different patterns where epistasis is 
introduced. 

Strong Recombination and Weak Selection : Here the combined 

field is a perturbation of -R. When a small epistasis component is 

included A is no longer invariant but there is an invariant manifold 

A near A. Far from A recombination is dominant and moves the 

point close to A with slight change of gene frequencies. Near A 

motion is determined by selection and mutation. E: A II A is still 

J a a 

a diffeomorphism but now the change in gene frequencies at the separ¬ 
ate loci are not independent. There are small interaction terms due 
to epistasis and to the deviation from linkage equilibrium. This 
motion "parallel" to A is Shahshahani's description of quasi-link¬ 
age equilibrium [see [28, Sec. 5.4]). 

Strong Selection and Weak Recombination ; Suppose m has an 

interior maximum. If this maximum is isolated then the combined field 

acts like a perturbation of a one-locus, n-allele model. Recombination 

and mutation deflect the asymptotically stable equilibrium of the 

combined field away from the maximum of m unless this maximum 

happened to lie on A* If there is incomplete epistasis, say exactly 

K-type so that the maxima of m occur exactly on a leaf of ® , then 

K 

selection alone would move a point toward this maximum leaf. The 

combined field has an invariant manifold D near this maximum leaf 

which is bent toward A if the original did not meet A. Far from 

D selection is dominant and moves the point close to D with rapidly 

increasing mean fitness and only a slow change in the recombination 
S ~ 

values L... Near D motion is determined by recombination and 
ID 

mutation with little change in mean fitness. I conjecture that motion 
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on D will move asymptotically toward an equilibrium near the point 
of D of maximum entropy. 

We should point out where the "strength" of R comes from. It 

is usually felt that this is a matter of linkage. Indeed,, very tight 
S 

linkage, r^ « 1, leads to weak recombination. But with moderate 
linkage values the size of R tends to be determined by the birth¬ 
rates, b.j. In most natural populations most of the time mean fit¬ 
ness is about 0, i.e. exponential growth or decay must be transient. 

So the fitness values m.. tend to be clustered around 0. Now since 

ID 

m.. = b.. - d.., a moderate fitness value m.. can arise either through 
ID iD iD ID 

moderate birth and death rates or through a high birth rate and a 
compensating high death rate. The selection field doesn't distin¬ 
guish between these two possibilities. But the latter case leads to 
strong recombination relative to selection. 

The pictures in these two extreme cases are quite different. 

This suggests that the interesting intermediate cases might be quite 
complicated. What can one say when selection and recombination are 
of comparable strength? The best organizing principle is Wright's 
"adaptive surface" point of view [36]. 

In a paper titled "On the Nonexistence of Adaptive Topographies" 
[24] Moran constructed explicit examples in the two locus, two allele 
case which showed that under the combined effect of selection and 
recombination mean fitness need not be always increasing. With the 
benefit of hindsight it is easy to construct a large class of such 
examples. 

Suppose that m has an isolated maximum at some point p of 
A - A. Now add in a small recombination term as in the second case 
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above. The combined field has at equilibrium p near p but since 
R / 0 at p, p p, p is an isolated maximum of m, so 7 (~ m) does 
not vanish at p. Different solution curves for the combined field 
will approach p along different directions. If m is increasing 
for a particular solution, then along a solution approaching from the 
opposite direction m will be decreasing. If, for example, the 
matrix m^_. has distinct eigenvalues one can find enough approach 
directions to make this argument work. 

But the title of Moran's paper makes too strong a claim. The 
adaptive surface point of view does not assume that evolution acts 
to maximize mean fitness, just that it tends to maximize some hypo¬ 
thetical "fitness function" [37]. For example, we have seen that 
recombination acts to increase entropy while selection acts to in¬ 
crease mean fitness. Perhaps the competing claims of selection and 
recombination can be measured by some combination of mean fitness 
and entropy which would increase along solutions of the combined field. 
The general hope leads to the following: 

The Wright Conjecture : The combined field admits a global 
Lyapunov function. That is, there exists a "fitness function" 

F: A -» F such that dF/dt > 0 along solution paths other than 
equilibria. 

Ewens raises essentially the same question [10, p. 96], 

For zero epistasis fields the above program usually works. 

Note that m alone is not a Lyapunov function despite (9.2) because 
on the i) leaf of maximum fitness the combined field moves the point 
toward A while m remains constant. However, addition of a small 
term related to entropy usually does give a Lyapunov function at 
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least locally. The details are in Chap. IV. 

Since the zero epistasis case is pretty clear without a 
Lyapunov function, the real interest lies in the cases where there 
is epistasis. In general, the Wright Conjecture is false because 
periodic orbits or cycles can occur in such cases. Since a function 
can't be continually increasing as one goes around a cycle the Wright 
Conjecture fails in such cases. The existence of such cycles is 
proved in Chap. IV by the demonstration that a Hopf bifurcation can 
occur in the family of combined fields [ 22 ]. 

Certain kinds of bifurcation are consistent with the adaptive 
surface picture and are familiar to geneticists. For example consi¬ 
der the family of real differential equations depending on the real 
parameter X: 

(9.3) ~ = -x 3 + Xx = -x(x 2 - X). 

dt 

The vectorfield in (9.3) is the gradient of f where 

X 

(9.4) f (x) =-x 4 /4 + Xx 2 /2 = -x 2 (x 2 - 2X)/4. 

A 

When X <[ 0, x = 0 is the only equilibrium of (9.3) and it is asymp¬ 
totically stable. When X > 0, x = 0 is an unstable equilibrium but 
there are two asymptotically stable equilibria at x = +JT. In terms 

of the potential or fitness functions: For X <[ 0 f has only one peak 

A 

or local maximum, and that is at 0. For X > 0 there are two peaks 
at +y[\ separated by a trough, or local minimum, at 0. Thus, as X 
changes from negative to positive passing the bifurcation point X = 0 
we see the erosion and splitting of an adaptive peak into two. This 
is the familiar metaphor for the early stage of speciation in the 
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adaptive surface picture. 

2 

But now consider the family of equations in R depending on a 
real parameter X: 


(9.5) 


dx 

it = Xx 


2 2 

y - x(x + y ) 


dy 

dt 


x + Xy - y(x 


y 2 ). 


In polar coordinates r = x + y , tan 0 = y/x this family is: 


(9.6) 


dfi 

dt 


1 


dr 

dt 


-r (r 


X) 


(r ^ 0). 


Since d0/dt = 1 every point (except the origin) moves counter-clock¬ 
wise about the origin with unit angular velocity. Now if X ^ 0 dr/dt 
is always negative and so every point spirals inward to the origin. 

So the origin is an asymptotically stable equilibrium when X <; 0. 

But if X > 0 dr/dt is positive for r < s/T and so solutions beginning 
inside this circle spiral outward. Solutions outside the circle still 
spiral inward but not to an equilibrium. When r = VT dr/dt = 0 while 
d0/dt = 1 and so the motion along the circle of radius /X is a limit 
cycle solution of the equation. As in (9.3) the change of X from 
negative to positive changes the stable equilibrium at the origin 
to an unstable equilibrium. But now instead of an adaptive peak 
splitting in two a limit cycle is emitted. This is the Hopf 
bifurcation. 

The Hopf bifurcation does occur naturally in a biological con¬ 
text namely in predator-prey equations. In most forms of the predator- 
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prey equations (one species each) there is a unique equilibrium point 
where both can coexist. Depending on the choice of parameters this 
equilibrium is either asymptotically stable or unstable. Here we are 
excluding the original Lotka-Volterra equation itself with its 
structurally unstable foliation of the first quadrant by cycles. When 
the equilibrium is unstable it is enclosed by a cycle. This is all 
discussed in May's book [23, Chap. 4], In the Appendix he discusses 
a particular model in detail and gives a diagram [23, p. 192, Fig. A. 1] 
showing the values of the parameters for which the equilibrium is 
stable or unstable. Changing the parameters so as to cross from the 
stable to the unstable region is an example of a Hopf bifurcation. 

Both sorts of bifurcations are discovered by looking at the 
linear approximation of the differential equation near the equilib¬ 
rium --essentially the derivative of the vectorfield at the equilib¬ 
rium point. If the eigenvalues of the linear part are all negative 
or have negative real part if complex, then the equilibrium is asymp¬ 
totically stable. The first sort of bifurcation is caused by a real 
eigenvalue changing from negative to positive as the parameter of the 
equation changes. The Hopf bifurcation occurs when a complex conju¬ 
gate pair of eigenvalues cross the imaginary axis—i.e. the real 
part changes from negative to positive. The important point here is 
that complex eigenvalues are needed for a Hopf bifurcation. In 
general, the real part of an eigenvalue tells the radial rate at which 
the solution approaches or leaves the equilibrium. The imaginary part 
tells the angular rate at which the solution spirals around in some 
plane containing the equilibrium. So the imaginary part is needed to 
give the twist around the origin which becomes a cycle when the radial 
rate changes signs. 
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Hopf bifurcations can't occur in families of gradient fields 

because—as mentioned above—cycles can't occur. In fact, complex 

eigenvalues can't occur. In essence, the matrix of the linearization 

2 

of the gradient of f is (d f/dx^dx^) and this is symmetric. Sym¬ 
metric matrices have only real eigenvalues. 

Now in Chap. IV we define for a vectorfieId X on A the 
Hessian H^X at each point p of X. It is a real-valued bilinear map 

on the tangent space T A closely related to the linearization of the 

P 

vectorfield. This Hessian is symmetric at every point p of A if 
and only if the vectorfield X is a gradient with respect to the 
Shahshahani metric. Since recombination is not a gradient there 

o 

exist points p of A where the Hessian is not symmetric. Fix such 
a point p. By adding in various symmetric terms we can get any real 
parts we want for the eigenvalues in particular negative, zero and 
positive. Since the Hessian is not symmetric we can also make sure 
that some of the eigenvalues have nonzero imaginary parts. The varia¬ 
tion of the symmetric part of the Hessian comes from the selection 
field. 


We saw with equation (6.6) that we could think of the fitness 

numbers m.. as built from m, m. - m and 9... Now with p fixed we 
ID i ID 

construct the selection fields to do what we want. First, choose m 
at p to be anything, say 0. Second, choose m. - m at p to equal 


-X i (p) where 2 X^(p)d^ is the field we are looking at. This means 

that whatever 0. . is, v (~ m) + X has p as an equilibrium. Finally, 
ID ^ 

is an arbitrary symmetric matrix subject only to the conditions 

0. = 2 p. 0.. =0 for all i. It turns out that there is still enough 
1 J 1 D 

room to choose so that the 0 ^^'s can be chosen to give any symmetric 
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~ 1 - o I 

Hessian H (^(“ m) ) we want on the subspace T A of F . So by varying 

p ^ p 

our choice of 0's we can cause a Hopf bifurcation to occur at p. 

The result is the most unsatisfying mathematical theorem: 
an existence proof. It leaves open several technical questions: Can 
the cycles which occur be asymptotically stable, i.e. limit cycles, 
like our simple examples? Are the cycles structurally stable or 
are they transient phenomena occurring only for peculiar isolated 
selection values? Answers to these questions require the construction 
and detailed study of particular examples. Also awaiting explicit 
examples is the more important question of the biological meaning of 
these examples. Do they have real significance or are they located 
in some biologically grotesque region of the space of selection 
matrices? In short, can the Wright Conjecture be revived for some 
restricted class of examples broader than zero-epistasis cases? 

I mean no disrespect to Sewall Wright by attaching his name to 
a false conjecture. I thought it was true when I named it. Further¬ 
more, as the above remarks indicate it may yet be true for real biolo¬ 
gical models. However, now I no longer think so. 

I think of this result as analogous to Smale's result in [30]. 
There he shows that despite an apparently restrictive collection of 
axioms for equations modelling ecological competition, essentially 
any sort of dynamical behavior is possible given enough species. 

Here we are dealing with a much tighter class of models and within it 
we have discovered only the simplest sort of pathology, namely cycles. 
However, cycles are the first thing to look for after equilibria. 

Since we discovered them so easily, I suspect that much more com¬ 
plicated dynamical behavior lurks in these multilocus models. But 
again whether the dynamical complexity in Smale's models or ours 
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is biological or merely mathematical can only be elucidated by the 
study of particular examples. 



II. The Geometry of Epistasis 


1. Orthogonal Decompositions . 

Let I be a set containing n elements. In Sec. 4 of the 
previous chapter we defined the subsets P={xeR I :x ^^>0 for i €i) 
and A = {x € P: Z x^ = 1). By the formula I. (4.1) we defined the 

0 r I 

Shahshahani metric* a Riemannian metric on P = [x € R :x^>0 for 

i e i). It restricts to a metric on A = A H P. In Table 4 of Sec. 

1.4 we defined the maps E S (x) = Z a.x. and L b (x) = Z b. Jin x. for 

^ ii i i 

x e P, with a,b e R 1 . Note that the associations a -> E a and b 
are linear. 

Now let A, B be a splitting of R 1 orthogonal with respect to 

the usual inner product, ( , ). That is, R 1 is the direct sum of 

subspaces A and B and a € A, b € B imply Z aja^ = (a,b) = 0. 

Choose bases (a'^,...,a t ) for A and fb\ . . . ,b^) for B, so that 

Alt B • d 

d + t = n. Define the maps E : R -> R and L : P R by 


E A (x) = (E 3 (x),.. 

■ •,E a (x)) 

B b 1 

L (x) = (L (x),.. 

b d , . . 

. .,L (x)) 


The above notation is somewhat abusive in that E and L really 

depend on the choice of bases, but a different choice of basis just 

changes the definition by multiplication by a nonsingular matrix in 

A B 

the range. Actually one can define an invariant version of E and L 

mapping to the dual spaces of A and B respectively so that the 

choice of bases simply amounts to choosing coordinates on the range. 

A e B ° 

E : P -> A* and L : P B* would then be defined by: 
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( 1 . 1 ) 


<E A (x),a> = E a (x) = (a,x). 
<L B (x),b> = L b (x) = (b,An x), 


o tn 

where for x g P we denote by in x the vector whose i coordinate is 

° I 

in x^. Thus, j£n: P -> F is a diffeomorphism. 

E A : F 1 -> F*" is an onto linear map with kernel B and L B is the 

composition of in with E B : F 1 -> F^, an onto linear map with kernel A. 

A o o t 

Thus, E maps P onto an open cone in F and P onto P^ (the 

closure of P ) while L B maps P onto all of F^. Furthermore, it is 

clear that 


( 1 . 2 ) 


E A (x ) 

= E A (x 2 ) 

iff x., - x 

G B 

1 

1 

2 

L B (x l ) 

= L B (x 2 ) 

iff in x^ - in x^ e A, 


1 Theorem : Let A © B = F be an orthogonal splitting with respect to 
the usual inner product ( , ) on F 1 , with IgA. Let {a a : a = 1,...,t) 

Q 

and {b : p = l,...,d) be bases for A and B respectively. Define 

Alt Bod 

E : F F and L : P F by 


(1.3) 


E A (x) a = E a (x) = (a a ,x) = a^x i 
p i 

L B ( x ) = L b (x) = (b^,An x) = y b^ in x., 

p Z__ i 1 


0 A°° t A B o o d 

and let P^ = E (P) . P^ is an open cone in F and E xLrP-^P^xF 

is a diffeomorphism with the following diagram commuting: 
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We will denote by 5) the "foliation of fibres" of E, whose 

A 

A -1 

leaves are of the form (E ) (point). The leaves of $ are bounded, 

B 

convex sets each open in a hypersurface of dimension d and L yields 

d 

a diffeomorphism of each leaf with F . We will denote by j the 

B — 1 

"transverse foliation" whose leaves are of the form (L ) (point). 
With respect to the Shahshahani metric these foliations yield an 
orthogonal splitting at each point. For x e P 


(1.4) 


T 5) = {7 L b : b e B) 
x x 

T J = {7 E a : a e A) 
x x 


[7 x L b : p = l,...,d] 

a 

[7 x E S : a = l,...,t] 


where [...] means the subspace spanned by the listed vectors. 

Proof: By I (4.13) the subspaces of T P: {7 E a ) and {7 L b ) are 

x X x a 

perpendicular with respect to ( , ) . Since they have bases {7 E ) 

0 x x 

b 

and fv x L ) respectively and since d + t = n = dim T x P, they form an 

orthogonal splitting of T x P with respect to ( , )^. 

A B o d+t 

E XL maps P to F and the gradients of the components 

are everywhere linearly independent. So the derivative of this map 

is an isomorphism at every point. It follows from the inverse func- 
A B 

tion theorem that E x L is locally a diffeomorphism. Furthermore 

the tangent space of 5) at x is the ( , ) perpendicular complement 

a , 

a b 

of [7 E : a = l,...,t] which is the space spanned by the 7 L 's. 
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(1.4) follows. 

A B * ° d 

To complete the proof we have to show that E x L : P P^ x R 

A 

is one-to-one and onto. Since E maps P onto P^ it suffices to 
choose e P, let = E A (x^) and show that L B maps 
(E A | P) ^(z 0 ) = (E A ) 1 (z°) n P = (x° + B) fl P one-to-one and onto R d . 
Because 1 e A, the set (x^ + B) fl P <= {x: |x| = | x^ | ) fl P and so is 
bounded. We will prove bijectivity using the following topological 
lemma whose proof we will postpone until the appendix. 

2 Lemma ; Let U be an open set in Euclidean space and F: U -> R be 
a local homeomorphism. If F is topologically proper, i.e. F ^(C) 
is compact whenever C is compact, then F is a homeomorphism and 
so is one-to-one and onto R . 

So we are left with checking that on (x^ + B) fl P = S q L B is 

x 

topologically proper. If it is not then there exists a sequence 
(x 11 ) in 3) with no subsequence convergent (in 3) but with L B (x 11 ) 

x o x 

bounded. Since (x + B) fl P is compact we can assume by going to a 

n oo o ° 

subsequence that x converges to a point x e (x + B) fl (P - P) . So 

00 o° 

x. > 0 for all i, but the set I = [i: x, = 0] is nonempty. Let 
1 001 

00 0 0 ° 
b = x - x . Then b e B, and since x e P, b. < 0 for i e I . 

1 00 

L' b (x n ) = V fb. in x 1 ?: i e I } + \ {b. in x 11 : i € I - I }. 

/ 1 1 00 / 11 00 

In the second sum each term is bounded as n -> oo. in the first sum 

b n 

each term tends to +oo as n -> oo. Hence, L (x ) tends to too. But since 
b is a linear combination of the b^'s and (L B (x n )] was assumed to be 
bounded, {L^ > (x n )) is bounded. This contradiction completes the proof. 


QED 
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Remarks; 1. In choosing the basis for A, we can assume that a = 1 

t d t ° 

and so P D (z g R : z = 1} = A and P fl (z g R : z = 1} = A, 

A t A A t A 

o 

are the images E A (A) and e a (A) respectively. A^ is a closed convex 

cell of dimension t with interior A^ (relative to (z^ = 1 )). 

A B • 

E x L on P restricts to a diffeomorphism in the diagram 



Restricting to A * the foliation of fibres of E consists of those 

O — 

leaves of $) which meet A. The transverse foliation j consists 

o o 

of the leaves of T intersected with A. For p € A 

T 5 = {y L b = 7 L b : b € B} = [7 L^s P = 

P P P P 

(1.5) 

a 

T7 = T {T n A) = (7 E a : a g A) = [7 E a : a = 1,...,t-1]. 

P P P P 


For the second equation we assume a t = 1 and recall that 7 ^E^ = 0* 

A i 

2. In applications we will often define E mapping to R where 
1 l 

a ,. . . ,a ( i t) is some spanning set for A. In this case is a 

A I & 

cone, open in a t-dimensional subspace, E (R ) , of R and by the 
Image Problem we will denote the problem of describing this subspace. 
(1.4), (1.5) and Table 1.4.4 imply: 


3 Addendum: (a) The vector £ x.5.d. g T T iff 5 g A iff £ b.C. =0 
- 1*1 ! x * 1*1 

for all bGB. Ifx=pGA then £ p.£.d. e T j iff in addition 

*1*1 1 p 


§ 


Pi ?i = 


0 . 
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(b) The vector 2 X.5. e T $ iff X g B iff 2 a.X. = 0 for all 

1 1 X 11 


a g A. 


Recall that f = 2 p.§. = (2 p.f.d.,y E ) . Note that in con- 

i i v i i i p 'p 

sidering the transverse direction in part (a) we consider the relative 

rates while for the fibre direction in (b) we use the absolute 

rates X.. 

i 

4 Corollary ; Let f be a continuously differentiable real-valued 
function on P. The following are equivalent: 

A 

(a) f factors through E , i.e. there is a differentiable 

A A A 

real-valued function f : P„ -> R such that f = f «E , 

A 

(b) For all x g P, g A. 

dx 

(c) For all x g P, (y f,y x L = 0 for all b g B. 

Proof : By continuity the (unique) continuous factoring of f on P 

exists iff it exists on P. Since is continuous in x, (b) holds 

ox. 

1 

for all x iff it holds for x g P. (b) on P and (c) on P are 
clearly equivalent and these are equivalent to f being constant on 

A 

the fibres of E (see Prop. 1.3.3), which is equivalent to the exis¬ 


tence of a continuous factoring. Such a factoring is automatically 

as smooth as f because the onto linear map E has a right inverse 
- t I A 

E: R -> R and so f = f«E on P^. QED 


Remark : The same result goes through if f is defined only on A 

and yields a factoring to a map on a a » Simply apply the result to 
fop where p: P -> A is the projection. Also since yL^ is perpendi¬ 
cular to p and yf differs from yf by a multiple of p we can re¬ 
place yf by yf in (c) . 



For applications to fitness, the following special case is 
important: 

Ixl — 

5 Corollary : Let m be a symmetric element of R . m: A -> R is 

defined by m(p) = 2 p^p^m^. The following are equivalent: 

— A 

(a) m factors through E . 

(b) For every p e the element of R 1 defined by 

i -> m. (p) = 2 p.m. . lies in A. 
i D ID 

(c) For every j el, the element of R 1 defined by i -> m^ lies 

in A. 

(d) B is contained in the annihilator of the bilinear map 
defined by the matrix m. 

Proof : Since at p is equal to twice m.(p) the equivalence of 

ox, 1 

i 

(a) and (b) follows from Cor. 4. (b) implies (c) by using distribu¬ 

tion p concentrated on j, i.e. p. = 6 .. (= 0 or 1 as i ^ j or 

i id 

i = j) and (c) implies (b) because A is closed under linear combina¬ 
tion. Finally, (d) means that £. b.m.. = 0 for all j and all b e B. 

i i 13 

This is equivalent to (c) because A and B are orthogonal comple¬ 
ments. QED 


For the foliation J of A there is one particular distin¬ 
guished leaf: 

(1.6) A a = (L B ) X (0) = {p e A: (p) = 0 for all b e B). 


A a is distinguished in being the leaf of greatest randomness relative 
to a choice of projection under E . Here randomness is measured by 
the entropy function H(p) = -2 p^jfcn p_^. The meaning of this statement 


is the following: 
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o 

6 Theorem ; For z e A a let rr(z) be the point of A with E (p(z)) = z, 
A B — 1 

i.e. let rr(z) = (E x L ) (z,0). tt(z) is the unique point of 

A -1 

(E |A) (z) at which the entropy function H achieves its maximum. 

A o —1 

tt(z) is also the unique point p of (E |a) (z) such that in p is a 

vector in A. 


Proof : First note that in p g A iff L (p) = (b ; in p) = 0 for all 

B B • 

b g B iff L (p) is the zero vector. Now define N : A F by 

B b^ 2 - B b^ - b^ 

N (p) = £ (L (p)) . VN = 2 Z q L (p) v L (p). Thus, the gradient 

P P 

B a - 

of N is tangent to the fibres of E , i.e. the leaves of 3), by (1.5) 


By equation 1(4.13): 


(1.7) ( 7 p N B , 7 p H) p = -2N B (p). 

g 

Thus, as one moves down the gradient of N toward the point tt where 
B b 

N = 0, or equivalently where L is the zero vector, H strictly 
increases. 

To make this precise, define H: A^ x F + -> R by: 

~ A B 

H (z,s) = sup{H(p) : E (p) = z and N (p) = s). 

A b — 1 

Since (E x N ) (z,s) is compact (it is a d-1 dimensional sphere if 

s > 0 and a point if s = 0) , H is continuous. I claim that it is 

strictly decreasing in s. For let s^ > s^ > 0 and let 
A B —1 ~ 

p^ G (E x N ) (z,s^) with H(p^) = H(z,s ). Flow along the vector- 

B B 

field -vN starting at p^ until one reaches a point p^ with N (P 2 ) 

A A 

= s^. Since the flow remains in the fibre of E , E (P 2 ) = z * Hence, 

H(z,s 2 ) ^ r (P 2 ) > H (P 1 ) = H(z,s 1 ). It is clear that 

A B —1 A°-l 

rr(z) = $ xL) (z,0) is the maximum of H on (E |a) (z). Further 

g 

more, for any s > 0, [N >s)isa neighborhood of the boundary in 
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(E A | A) 1 (z) and so it follows that h| (E A ^(z)) H (A - A) 
inf{H(z* ) ; s 0) < H(z*0) = H( (z) ) . Thus tt(z) is the unique maxi¬ 
mum point even when the boundary is included. QED 


Remark : If p € A the the gradient of entropy at p, 7 H* is tangent 

A p 

to A a , i.e. lies in T^A^. This is clear by direct computation and 

Addendum 2(a). Alternatively* since H|S)^ achieves it maximum at p* 

7 H must be perpendicular to T 2) and so lies in T A (since A = J 1 ). 
p P P P A A p 

All of the results of this chapter can be reinterpreted in 
statistical language. The leaves of j correspond to what statisti¬ 
cians call log-linear restrictions on frequency tables for the finite 

0 

set I. Such a restricted set of frequencies is (x € P: Z b^jfcn x_^ = 
for all b € B) where are constants* depending on b € B. This 
set is a leaf of and different consistent choices of constants 

define different leaves. One of the consequences of Thm. 1 is that 
the only consistency condition needed to define a leaf is linearity 
of in B. 

0 

Alternatively* if x € P then by (1.2) the leaf 
° 0 

T q = [x € P: An x - in x € A] i.e. J ^ can be parametrized by A 
x x 

via the map 


t \ i 0 

x (a) ^ = e x^ 


a x(a) defines a diffeomorphism of A onto J mapping 0 to x . 

x 

— o 

The leaves of J* in A satisfy the additional restriction 

E p^ = 1. If p% a then ^ can be parametrized by A via the map: 
1 P 


P(a) ± = C(a) 1 e X p? , C(a) 


0 a i 
Pi e . 
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1 2 

The map a -> p(a) is onto but not injective. p(a ) = p(a ) iff the 

12 12 
ratios exp(a^): exp(a^) are independent of i, i.e. iff a - a is a 

multiple of the vector 1. Thus, if we restrict the map to a comple¬ 
ment Aq of [1] in A we do get a diffeomorphism of A^ onto T q. 

P 

To a statistician, (1.9) says that each leaf of j is a t-1 
dimensional exponential family of distributions on the finite set I. 
Any exponential family of distribution on I can be exhibited this 
way. 

In particular, since the leaf clearly contains the center 
0 0 

of the simplex p with p_^ = 1/n, can be parametrized by A^ via: 


( 1 . 10 ) 


-1 a ‘ 
= C (a) e 1 


C(a) = e 1 , 


where we are absorbing the constant 1/n into C. 

The theory of contingency tables [12] provides another view- 

point. The linear map E on A corresponds to what Gokhale and 

Kullback call the design matrix. In applications instead of knowing 

the entire distribution vector p we only know z = E (p). The 
A -1 

family (E ) (z) of all distributions corresponding to z is, at 

e — 

least in A? a leaf of the foliation ®. Of special interest in 
this leaf is the point tt(z) in A^ which is in some sense the distri¬ 
bution with the most independence among the elements of I subject 
to the constraint imposed by the design matrix and the fixed vector 


z. Now for p e A let rr be the unique element of A. H S , i.e. 

A p 

A A ABA 

E (p) = E (tt ) and tt e A . So E x L (tt ) = (E (p),0). Define the 


normalized entropy: 
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(1.11) H(p) = H (p) - H(tt) 

= ~y~P L in p ± + tt^ in tt i 

= -^Tp. to(p.A.). 

The last equation in (1.11) is true because p - tt € B by (1.1) and 
Jin ti € A by Thm. 6. So with the usual inner product (p, in tt) 

= (tt, in tt) . 

A O 

7 Lemma; H: A -> P satisfies the following: 

(a) H(p) ^ 0 for p g A. 

(b) H(p) = 0 iff p € A a - 

(c) With respect to the decomposition T A = T r © T 5 the 

P P P 

T 2) components of u H and 7 H agree. 

P P P 

(d) v H = 0 iff p g A 

P A 

Proof : The function H(tt) = H - H is constant on the leaves of 2). So 

the gradient of this function is everywhere perpendicular to T 2). 

A —1 

This proves (c). On the fibre (E ) (z) H has a strict maximum at 

tt by Thm. 6 . So H has a strict maximum at tt and at tt, p = rr so 
H = 0. This proves (a) and (b). Now if p / A^ then by (1.7) 

(V N 6 ,? H) = (7/.7H) = -2N B (p) ? 0. 

P P P P P P 

_ g _ 

The two dot products agree by (c) because v^N e T^2). Finally, if 


P e 

a a’ 

7 H e 
P 

T A, by the 
P A 

remark 

after Thm. 

6 . 

7 (H - 
P 

H) e T A^ 

P A 

because 

H - H 

is constant 

on the 

leaves of 

©. 

So 7 ft 
P 

g T A, for all 
P A 

P e 

V 

This 

means (7 H) 
P 

on a a 

is v fi iv 

where this 

gradient is 


taken with respect to the Riemannian metric restricted to a a * But 
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H|is constantly zero. So its gradient is zero. This proves (d). 

QED 

On each fibre of 5) the negative -H is Kullback's information 
discrimination I (pm) = £ p^ Xn(p^/n^) see [12] and [21]. So the leaf 
consists of the mimimum discrimination information (MDI) estimate 
of p subject to the design matrix constraint E (p) = z. 

In the usual applications the set I is a product and the 
design matrix constraints correspond to knowing certain marginal dis¬ 
tributions or joint distributions on some subproducts. This is 
exactly the case to which we now turn. 

2. The Product Model . 

The set I is the Cartesian product of the sets 1^, 

a = L = is the index set of loci. A complex K 

is a nonempty collection of subsets of L such that if c and 

S^ g K then e K. We will repeatedly identify a subset S of L 

with the complex consisting of S and all of its subsets. Note that 

the empty set, j#, lies in every complex K. If ScL I = nfl : a e S} 

o CL 

and if i e I, i is the element of I whose a coordinate is i for 
S S a 

all a in S. In this section we will often write f(i) for f^ to 
avoid complicated subscripts. 

We define to be the subspace of R 1 whose members are sums 
of functions depending only on blocs of loci in K. Thus 

I z s 

(2.1) s? = [? e R : for some (p e F , ? (i) = cp(i g ) for all i € i). 

(2.2) ^ = ^U s : S € K). 
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Thus, | g s£g if it depends only on the loci in S and f g ^ 


S S 

if there exists 4 ) e P for all S e K such that 


g(i) =\ fcp (i g ): S e K). 


In particular, consists of the constant functions. 

It is clear that if and are complexes then the union 
K 1 V K 2 iS and ( 2 - 2 ) implies: 


^l VK 2 = + S' 


If i,j g I and S c L, i^j-g denotes the element of I whose a 

coordinate is i for a g S and j for a g S = L - S. Fix j g I. 

a J a J 

Clearly, | e iff § (i) = § (i^jg) for a ii since this means that 

the S coordinates are irrelevant to the value of f. So we are led 
to define the following linear maps : p 1 P 1 and 
Dg = 1 - Pgt F 1 -> R 1 by: 

(2.5) Pg (I) (i) = ?(igj~), Dg (5 ) (i) = §(i) - 5 <Vg> for 1 e z - 

Clearly, (Pg)^ = P g- P g i s a projection and is the com¬ 
plementary projection. We recall some of the elementary properties 
of projections (see Halmos, [13 , Sec. 41]). 


1 Lemma : (a) If P is a projection on a vector space V then 

Im P = Ker 1- P= {§ gV: P£ =§3 and Ker P = Im 1 - P 
= g V: Pf = 0}. V is the direct sum of Ker P and Im P. 

(b) If P^ and are projections which commute (P^P^ = P 2 P 1^ 
then ^ LS a projection commuting with both and 
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(2.6a) Ker (Ker P ± ) = Ker + Ker V^ 

(2.6b) Im P X P 2 = P 1 (lm P 2 ) = (Im P ± ) D (Im P ) 

(2.6c) p” 1 (Im P x ) = Im P ± + Ker P 2 

(2.6d) P x (Ker P 2 ) = (Im P ± ) fl (Ker P 2 ) 

(c) If f ) and (Pj} are two finite families of projections 
all commuting with one another then: 

(2.7) (1 - n.p.)(i - n.p.) = 1 - n. .(i - (i - p.)(i - p.)) 

3 3 >3 i 3 

If V. = Ker P. , U. = Im P. and similarly for V. and U., then: 

l i i i 2 j j 

(2.8a) (S. V.) fl (2. V.) = 2. (V. H V.) 

1 i 3 3 i * D i 3 

(2.8b) (n. u.) + (n. u.) = n. j(U. +U.). 

ii 3 3 iJ i D 

2 

Proof: (a): (I - P) = 1 - P and P(1 - P) = (1 - P)P = 0. Finally, 
if § e V, |=P§+ (1 - P)f writes f uniquely as a sum from the 

image and the kernel of P. 

2 

(b): ( p i p 2 ) = P 1 P 2" Ker P 1 P 2 3 Ker P 2 and Im ( P i P 2^ C Im P ]/ 
Since P^P 2 = P 2 P 1" Ker P 1 P 2 3 Ker P 1 and Im ( p ^ p 2 ) c= Im p 2 • This 
proves half of (2.6a) and (2.6b). The other direction follows from: 

1 " P 1 P 2 = (1 ~ P l } + - p 2 ) - (1 - p x ) (1 - P 2 ). 

So if § e Ker P^, f = (1 - P ± )^ + (1 - P 2 ) | - (1 - P ± ) (1 - P 2 )§ and 

so is in Im(l - P^) + Im(l - P 2 ) = Ker P^ + Ker P 2 * If 

5 g (Im P x ) fl (Im P 2 ) = Ker (1 - P^ fl Ker(l - P 2 ) then (1 - = 


0 
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and so | € Im p 1 p 2 * 

(2.6c) follows from (2.6a) applied to 1 - and P^. Similarly 
(2.6d) follows from (2.6b). 

(c): We first note that if commutes with the family {P_^} 

then 

n i (p 0 + (1 - W = p o + (1 - V n i p i- 

For if we expand the product on the left we get P^ and (1 - P Q^i P i 
as end terms with all of the cross terms divisible by Pq(1 “ P q) = 0 . 
Now we apply this equation twice: 

n.p. + (i - n.p.)n.p. = n.(n.p. + d - n.p.)p.) 

ii iijj jji iij 

= n.(p. + (i - p.)n.p.) = n. .(p. + (i - p.)p.) 

33 311 1,33 31 

= n ^. (1 - (1 - P ± ) (1 - P.)). 

This proves (2.7). (2.8a) follows by taking the image of both sides 

and (2.8b) follows by taking kernels. The equations are derived using 
(2.6a) and (2.6b). QED 

2 Proposition : Fix j el. 

(a) As S varies over the subsets of L the projections 

P“[ and D^. all commute with one another. 

S S 

(b) For K a complex define = n{D^: S e K]. dJ is a pro- 

K S K 

jection with kernel equal to sd^.. We let P^ denote the complementary 

projection 1 - D^. 

K 

(c) For complexes and K^, the intersection A is a 


complex and: 
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(2.9) 


^ 1^2 " % 0 ^2 


Proof ; (a): Since = 1 - it suffices to check commutativity 

O t> 

among the P^'s. Here it follows from: 

pJ = p- 1 p- 1 . 

S i nS 2 S 1 S 2 

(b) : We saw above that ^ is the Kernel of D^. The general 
result now follows from (2.2) and (2.6a). 

(c) : We prove the following identities: 


(2.10a) 


D J 

K 1 K 2 K 1 VK 2 


(2.10b) 


P J P J = P J 
K 1 K 2 K 1 AK 2 


The first is clear from the definition of the projections D“*. Lemma 

K 

1(c) implies that with varying over and over : 

p;L pJ = [i - n (i-p^ )][i - n_ (i-p^ )] = 1 - n q _ (1 -pJ p^ )• 

K 1 K 2 S 1 b l S 2 b 2 b l' b 2 b l b 2 


By the identity in the proof of (a) this is 1 - n (1 - Pi, n ). 

b l’ b 2 S 1 b 2 

This is P?, because as S n and vary, S_ fl varies over the sets 

1 A z 12 12 

of K A K 2 . 

(2.9) follows from (2.10b) by taking Images and applying (2.6b). 

QED 


Thus, P^ for each j is a projection to a^. In applications 

we usually want projections which are orthogonal with respect to the 

covariance metric associated to a distribution p, i.e. to ( , ) (see 

P 

Prop. 1.4.3). In the special case when the different loci are inde- 
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pendent we can get this projection from the P^'s and they have nice 

K 

properties: 


3 Proposition : For any p € A* define d]l = £ p.DiJ, and P^ = £ p. P* = 

K j K K j K 

1 - D^. and P^ are complementary projections with Im P^ = Ker dF 

K K K K K 

= If is any other complex 


(2.11b) 


VV S (K 1 VK = (S: S c S x € K L - K)). 


For K = {0} , P 'jg (§) is the mean § = £ Pjlj* 

Now assume that p is a member of the Wright manifold a* i.e. 
CL CL 

p. = II p. where p is the marginal distribution on the factor I 
i a i a 

a 

P P 

induced by the distribution p. In that case D". and P are orthogonal 

K K 

projections with respect to the covariance inner product ^( , ) on 
F 1 . Furthermore, the following identities hold: 


(2.12a) 


D P .D P = D P 
K 1 K 2 K 1 VK 2 


(2.12b) 


P P oP P = P P 
K 1 K 2 K 1 AK 2 


P P 

In particular, for fixed p € A the projections D and P~. all commute. 

K K 

If p is the distribution concentrated at j , i. e. p. = 6 . . , 

i id 

then p is in A and p|l, are the original projections P^, D“*. 

K K K K 

Proof : For all j, P^ maps R 1 into af^. and is the identity on a^. 


These properties are preserved in forming the convex combination P: 


K 


and characterize a projection with image =£^. The same convex combina¬ 


tion argument allows us to get (2.11a) from the corresponding results 
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for P" 1 . These results hold for by (2.10a) and (2.10b) and the 
K K 

S S 

identities of Lemma 1. If 5 e then § = £(cp : S e K^} with cp € =£ g . 

D^(cp S ) = 0 for S e K A K and D^(cp S ) e =£ for the remaining S. So 
K IK S 


d£(5) = S[D^(CP ): S € K ± - K} e SU g s S e K ± - K) = 


Vk' since p ^ (?) 


is the constant §(j), P^(§) is the constant £ Pj^j = 5* 

The proof of the identities (2.12) hinge on the special case 
of (2.12b) where and K 2 = 


pP i .p| 2 ( ? )(i) = =^P j V (I s 1 ns 2 k s 2 n2 1 j s 2 ) * 


Summing first k over the variables not in S^ H S^ and j over the 

variables in S we get (letting, for example, p(j~ ) stand for the 
2 S 2 
probability of j~ ): 

S 2 


X p V p S"V ! ‘V s 2 VVV- 


Now because the different loci are independent p(j~ )p(k ~ ) = 

S 2 S 2 nS l 

p (j~ k ~ ) and as j~ and k ~ range over I~ and I ~ the com- 

S 2 S 2 PS 1 S 2 S 2 nS l S 2 S 2 PS 1 

bined variable which we will call r~ ~ ranges over I~ ~ So we 

s i us 2 s 1 us 2 . 

continue the chain of equations: 


p{r s 1 iM 2 H{i s 1 ns 2 \ijs 2 ) = p Sl ns 2 (?)(i) - 


By induction it easily follows for a finite collection of sets [S 

M 

with intersection S that 


(2.13) 




p . n p; 
j [j, s 


In other words the average of the product is the product of the 


averages. Now is n(l - P^, : S e K). Expand out the product, average 
K S 
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over j, apply (2.13) and pack the product back together and we get 
(2.14) = nf 1 - p|: S e K}. 


From this (2.12a) is clear. We can rewrite (2.12b) as 


(i - D^).(l - D^> - 1 - 


This is true when p = j by (2.10b). Expanding out the product, aver¬ 
aging over j and applying (2.12a) we get that it holds for all p. 

p 

Next we show that is self-adjoint with respect to ( , ). 

Recall that a projection P is an orthogonal projection if it is 
self-adjoint because if | e Im P and 7) e Ker P then (£,7]) = (Pg,7|) 

= (g,P7|) = 0 and so Ker P is the orthogonal complement of Im P (see 
Halmos, [13, Sec. 75]). Since any algebraic combination of commuting 

p 

self-adjoint operators is again self-adjoint it follows that D and 

K 

p 

P are self-adjoint and so are orthogonal projections. Now if 
K 

5,7] e R 1 , then 

p (Pg5**n) = y>~p (i)p(j)S (i s jg)7](i) =^^p(i)p(j~)5 ( i s j'g)71(i) • 


Since the loci are independent p(i) = p(i )p(i~) and so 

s s 

=2^p (i s )p(i s )p(j s )s(i s j s )10(i s i s ) ’ 


summed over the variables i e I , i~ e I~ and j~ e I~. The expression 

o o S Jb S S 

on the right is symmetric in J and 7] and so (P?£,7|) = (P?7|,§) = 

p S p S 


_<s,pb)>- 

P s 


QED 


4 Corollary . Let n be the number of elements in I. The map 

P = n 1 E . P;j, is the projection of R 1 on sd. orthogonal with respect to 
K J K K 
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the usual inner product ( , ) on R 1 . 

Proof: Let n be the cardinality of I . Then n = n n and the uni- 

- a a a a 

form probability distribution p^ = n ^ has independent loci with mar- 

a -1 

ginal probabilities p. = n . So Prop. 3 applies. With the uniform 

la “ -i 

probability distribution p, ^( , ) = n ( , ) and so orthogonality 
with respect to these two inner product is the same. QED 

5 Definition : (a) For 5 g R 1 the carrier of §, K(§), is the 

smallest complex K such that § g Equivalently (by (2.9)) 

K(§) = a{K: | g a^ }. More generally, if T is a subset of R 1 the 
carrier of T, K(T) , is the smallest complex K such that T c s£^. So 
K(T) = a{K: T c a^} = v[K(|): ? g T}. If m is a function with 
values in R 1 then the carrier of m, K(m), is the carrier of the image 
of m. 

(b) For S c L the dimension of S, dim S, is the number of 
points of S minus 1. For a complex K the dimension, dim K, is 
the maximum dim S for S g K. For § g R 1 or m a function to R 1 , 
dim 5 or dim m is the dimension of the carrier of 5 or m, res¬ 
pectively. 

We defined the map E S : R 1 R g = R S (cf. I. (0.2)): 

(2.15) E S (x)(i g ) = ^{x(i g j~): j~ e I~) . 

g 

So if x = p g A, i.e. p is a distribution on I, E (p) is the distri¬ 
bution on I induced by the projection from I to I . For example, 

s s 

if S = a E a (p) is the marginal distribution on the factor 1^. In 

S S 

these cases, we will write p(i g ) or p (i g ) for E (p)(i g ). 

Now define for a complex K 
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r k = nfR s : s e K] 


E = n[E S : S e K): R 1 -> R 


Note as a particular convention R^ = R with (x) = |x| = Z x. . 

We are now going to apply the results of section 1 to the orth¬ 

ogonal decomposition with A = a^. Our earlier projection results 

K X 

enable us to analyze B. With i, j e I we inductively define b^ e R , 

using the Kronecker delta notation (6^j = 0 if i / j and = 1 if i = j) 


(2.16) 


hr . (k) = 6., - 6 

ij lk jk 


, KVS , K K - ... 

b. . = b. . - b-r . (i = i j~) 
l, j l * j SS 


6 Lemma : Let p e L, and X, f e R with for all k e I. Let 

b = b. .. Then 


D^(?)(i) = (b,§) = (v p L b ,X) p . 


Proof: The second equation follows from the definition of the 

Shahshahani metric and the computation of yL^ in Table 1.4.4. The 
first equation is clear for K = J# and then follows by induction on the 
number of elements of K since: 


D KUS (5)<1 > - ®X> (!>|1) * D K (S>(i) - D K (!,( VS> 


(b* 5) - (bf ,,?) = (b^ S ,g). 

J J J -Lj J 


QED 


K I 

7 Theorem : (a) The image of E : R -> R is the subspace V defined 

K K 

g 

by the obvious consistency conditions. That is, we say that (x } e R 

K 
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S i nS 2 S 1 S i nS 2 S 2 

lies in V if S ,S e K imply E (x ) = E (x ) where we get 

K J_ 2 . 

S i nS 2 S 1 

E (x ) ,, for example,, by summing on the variables of S, - (S H S ). 

S S 112 

1 2 S 

In particular, |x | = |x | and this common value defines |(x }| for 

{X s } e V K . 

K 0 o 

(b) E maps A onto a convex set denoted A which is open in 

K 

S 0 

the hypersurface of V defined by I{x }I = 1. The closure of A „ 

JA K 


denoted A , is the image of A. 

K 

K I 

(c) The kernel of E is the subspace B of F spanned by 

K 

[b K i,j € i}. Define L B : A -> F^ by choosing a basis for B . Then 
1 * 3 K 

the diagram 



commutes and E x L is a diffeomorphism. 

The fibres of E form the foliation $) of A and the fibres 

K 

B - - 

of L form the transverse foliation J . $) and are orthogonal 

K K K 

with respect to the Shahshahani metric. 


Proof: (a): We prove this in two parts. 

(i) When K consists of all proper subsets of L. 

This case we prove by induction on the number of loci. We are given 

(y 0t (i 1 ...i ...i ): a = e V where y a is a function of all of 

l ct j& Jc 

the variables except i^ and we want to find x e F^ such that summing 
x in the a variable, we get: 


x(i r . .* 


a--- 1 ,) -y (i l* 


'' i a''' i X ) ' 
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Here an asterisk denotes summation over the indices of the labelled 
locus. 

By inductive hypothesis we can define for each value of i a 
. X 

1 £ 

function z defined on i^. . . i^_^ which sums propertly for all a's 
except a = l. So we define 


= yU(i r--v--y - z(i r--v--V' 




-a . 


Then y is identically 0 for a = l,...,jfc-l and by the consistency 

r 

conditions on [y j. 


y * (i r--V--V = y a(i r--v--V = 0 (a < &) - 


Now 


-j e 


define q(i 1 . .. i ) = y ( i 1 -- i ;! . 1 )/n j[ - Then 


q(i l-*a-V = y a(i l‘ ' ' V - •*je )/n jt = 0 


(a < i), 


— £ 

by the above equation for y . Also 


q(i r-'ViV = yX( 1 a.— 

Thus, x = z + q is a solution with i loci for [y j. 

(ii) General Case : If K= L it is clear that V can be iden- 

K 

tified with R 1 . So by adding subsets S one at a time it suffices to 


show that V maps onto V when all proper subsets of S (the bound- 
KV S K 

ary of S) lie in K. So we are given a consistent family 
T S 

{x £ R t : T £ K) and want to find x £ R g consistent with this family. 
S T 

We can find x £ R^ consistent with {x : T g S) by applying case (i) 

to the set of loci in S. But if is any element of K, 

T 

1 T 

T = f! S g S. x projects to x by consistency of the original 

ST S 

family while x projects to x by consistency on S. Thus, x is 


consistent with the entire original family. 
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(b) - (c): These are direct applications of Thm. 1.1 and 

results which follow it. The proof is a matter of checking that with 

A = is the orthogonal complement of A and E is a version of 

E . First, Lemma 6 and Prop. 2 (b) imply that B is the orthogonal 

K 

S 

complement of The coordinate functions of the maps E are of the 

form E a with a = a, where: 

k s 

(2.17) ■* <i> - «! v < k S « V- 

s s s 


Thus a^ (i) =1 when i has specified values at the a positions 

for all a g S and = 0 otherwise. Regarded as a function of i, 

clearly depends only on i and so a^ g s£ . On the other hand, if 

S k s s 

Cp g R g and § (i) = cp(i g ) then 


5 = X f<p(k s )a k‘ k s e V' 


Thus, the vectors fa, : k el} span and the union of these sets 
k S S S 

S 


with S g K spans j . 

x\ 


QED 


Remark: The description of the subspace V of F by the linear con¬ 

ic K 

sistency conditions in part (a) solves the Image Problem referred to 
after Thm. 1.1 for the case A = This in turn leads to an obvious 

r S, 

conjecture about the description of A . Ip J g k is a collection 

K K 

of probability distributions on the subsets S, satisfying the con- 

S 1 S 2 

sistency conditions, i.e. p on I and p on I induce the same 

s 1 ns 2 s i s 2 

distribution p on the common subproduct l g ^ . Is every such 

collection the image of a distribution p in A, i.e. does there 

S 

exist a distribution p on I inducing the given p on the sub¬ 
product Ig for all S in K? Part (a) says that there exists x g R 1 
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s s s 

with E (x) = p for all S and clearly 2^ x^ = | [p } | = 1. It is 
not clear from (a) that x can be chosen with x^ ^ 0 for all i. But 

S 

it does seem reasonable that the positivity of all of the p 's should 
allow us to choose some positive x, i.e. an element of A. Reason¬ 
able, yes, but false. A counter example is given in the Appendix. 

8 Theorem : Let X(p) = 2 p^§^(p)d^ be a vectorfield on A. So is 
a function of p for each i and f = 2 P^5^ = 0. Let K be a com¬ 
plex of subsets of L = [l,...,jfc). The following conditions are 
equivalent: 

(a) The carrier of £ is contained in K. 

(b) For each p € A, § (p) € =£ . 

S o I S 

(c) For all S € K there exist functions cp : A -> R as smooth 

S 0 

as 5 such that g (i) = 2fcp (i ) : S € K} at every point p € A. 

u 

o 

(d) For all i,j € I and every point p € A we have 

X" b i,j 005(10 = °- 

k 

(e) The vectorfield X is everywhere tangent to the trans¬ 
verse foliation T. 

K 

(f) For every vector b in the vector space B spanned by 

K 

[b^ i,j e i}, the function L^ is an integral of the motion for the 

1 j ] 

vectorfield X. 

If X is the gradient of a function f: A -> R, i.e. X = y f or, 
equivalently, at every point p of A then the above condi¬ 

tions are further equivalent to: 

(g) The function f factors (uniquely) through the map E . 


Proof : Except for the smoothness in (c) the equivalence of (a) - (c) 
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is a matter of definitions. They are equivalent to (d) because B is 
the orthogonal complement of with respect to ( , •). (b) is equi¬ 
valent to (e) by Addendum 1.3(a), (d) is equivalent to (f) because 

b 

is an integral of the motion iff the gradient of L is orthogonal to 
X at all points. In the gradient case equivalence with (g) follows 
from Cor. 1.4 and the Remark after it. 

The smoothness portion of (c) is proved by induction on the 
number of subsets in K. If K = S, the projection from I to I 

x s s 

induces an isomorphism from F c = F onto =£ and we define cp to be 

s i 

the composition of g with the inverse isomorphism (and cp =0 for 

S..CS). If K = K V S and P , D are defined as in Cor. 4, then 
1 1 

P og maps into and by (2.11b) D og maps to =£ . Applying the 

Ki ^1 ^ 

initial step to the latter and the inductive hypothesis to the former 

we decompose f=P °f+D °?as the sum of functions as smooth as g. 
K i K i 

QED 

Recall that the selection field is the gradient 7 C~ m) where 

m = 2 P^Pj m ^j i- s mean zygotic fitness. m_^ can be regarded as an 
Ixi 

element of F where, in addition, m^ is symmetric in i and j. 

j£ 

9 Theorem: Let I = II , I and consider m. . a svmmetric function on 
- a=l a 13 

I x I (i.e. m.. =m.. for i,j e I). Let K be a complex in L. The 
13 31 

following are equivalent: 

— K 

(a) On L, then function m(p) = 2 p^p^m^ factors through E . 

(b) m.(p) = 2 p.m.. is a function of p from A to =£. 

1 3 13 jK 

S S 

(c) For every S e there exists cp : A -> F such that 


m i (P) = E S€K® (p) i s ‘ 


(d) For every j e I, m^. e as a function of i. 
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(e) For all S,T e K, there exists m ST e R S xI t suc h that 
ST 

m. . = E m (i .j ). Furthermore, we can assume that 
i] S,T S’ J T' 

ST TS 

"* ^VV = "* f ° r 311 S,T € K ‘ 

Recall that is completely symmetric if whenever Q c L, 

i = i j~ and j = j i~ then m. . = m-r-r. If m. . is completely symmetric 
Q Q Q Q id 13 iD * 

we can strengthen (e) to: 

s I s XI s 

(e ) For all S e K there exists m e F completely 

sym 

c 

symmetric in the variables of I such that m. . = £ m^(i , j ). 

S ID S So 


Proof : The equivalence of (a) - (d) follows from Thm. 8 and Cor. 1.5. 

Clearly, (e) implies (d). On the other hand, (d) implies that for 

s I s 

every j e I and S e K there exists m. e R such that m. . = 

D iD 

S S 

Z(m. (i ) : S e K}. Now consider m.(i ) as a function of j, i.e. as 
D S D k 

an element of R 1 with i fixed and project to K. For every T e K 

. . ST X S XI T . .. . 

there exists m e R such that: 

[m ST (i s , j T ) : T € K) = 

Sum on S: 


I 1 


ST 

(m (i s ,j T ): S,T e K) = P K (m ) = . 


The last equality because by symmetry and (d) m^ e as a function 
of j. Finally, by symmetry: 


m. . = ~ (m. . + m 
ID 2 V id 


Z 1 ST TS 

C <m (i s >j T ) + m (j T ’ i S ) ) J r S ’ T 6 


ST 

So we can replace m ^s'^T^ ky the bracketed term to get symmetry in 
(e). 

The complete symmetry result is less direct. For Q c L, define 
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T (m. . ) = ibtt (where i and j are defined in the statement) . T is 
Q ID id J Q 

Ixi 2 

a linear operator on F with ( T q) = identity. 

The index set for I x I is two copies of L. We can indicate 
subsets of the doubleof L by ordered products S x T with S,T c n. 


If and K 2 are complexes on L then x = {S x T: S € S 1 and 
T € ) is a complex on the double of L. Tq acts on subsets of the 

double and hence on complexes by: 


( 2 . 18 ) t q (s x t) = [(s n q) u (t n q)] x [(t n q) u (s n q)]. 

2 

Clearly, Tq = identity on complexes, too. In the notation of Def. 5: 

(2.19) Tq(C ar(m)) = Car(T (m)). 


That Car(T^(m)) cz T^Car(m) is clear from writing 

\ CTi 

(2.20) m(i,j) = y {m (ig,j T ): S x T € Car(m)) 


ST S T 

and applying Tq to both sides. Note that Tq^ ) e F where 

S x T = T q(S XT). On the other hand, Tq is clearly monotone on com¬ 
plexes, i.e. Tq(K^) cz Tq (K 2 ) if c K 2 as complexes on the double. 
Hence: 


Car(m) = Car(TQm) 


Tq Car(T Q m) c 


Tq Car(m) 


Car(m). 


Since the extremes are equal these are all equal and so (2.19) holds 

2 

after Tq is applied. Since Tq = identity, this implies (2.19). 

Now (e) implies that Car(m) c K x K. I claim that complete 
symmetry implies Car(m) cz 2K where: 


(2.21) 


2K = {S X T: S U T € K). 
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Complete symmetry means T^(m) = m for all Q c L. Hence, Car(m) 

Ls invariant under by (2.19). If S x T e Car(m) and Q = S then 

T (S XT): (S U T) x (S fl T) which must be in Car (m) c K X K and so 

ST I S XI T 

S U T g K. Now any function m e R is a. fortiori a function 

I xl 

m € R U U where U = S U T. Hence, the decomposition in (e„ ) 

Sym 

g 

follows from (2.20). To get complete symmetry of the functions m 
we average as before: = 2 S f Tq (m_^ j ) s Q c L). So we can replace 


m S by 2 ^SfT (m S ): Q c L}. 


QED 


Remark : The above proof shows that if m^ is symmetric and for each 

j fixed we regard m.. as a function of i, then Car.(m.,) c K for 

ID i ID 

all j implies Car..(m..) c K x K. Furthermore, if m.. is completely 
ID ID ID 

symmetric Car_^ (m^) c 2K. 

We saw in Prop. 3 that for p in the Wright manifold A, the 

projection of R 1 onto orthogonal with respect to the covariance 

metric ( , ) is given by P^. For qeneral p this projection is 

p K 

difficult to compute. However, we can interpret it geometrically and 
relate it to the concept of conditional expectation. 

First,a technical result: 


10 Lemma : Let and X^ be Riemannian manifolds and E: X^ -» X^ be a 


submersion, i.e. the derivative map d E: T X n -> T X„ is an onto 

, P P 1 Ep 2 

linear map for all p in X^. Let (the vertical subspace) denote 

the kernel of d^E and H^ denote the perpendicular complement of in 

T^X^. d E is an isomorphism of H^ on T^X^. Following O'Neill [26], 

we call E a Riemannian submersion if d E is an isometry of H on 

P P 

T X^ for all p. E is a Riemannian submersion iff the following 
condition holds : 
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For every smooth map f: X 2 -> F, the gradient of f.E at p e 
is the unique horizontal vector mapping under d^E to the gradient of 
f at Ep, i.e. 

(2.22) d p E(^ p (f.E)) = y Ep f. 

Proof : Since foE is constant on the fibres of E, the gradient of 

foE is always horizontal, whether E is a Riemannian submersion or 

not. The question is whether equation (2.22) holds. 

d^E is an isomorphism on so we can designate by d^E(v) an 

arbitrary vector in T X^ with v e H . Then: 

* Ep 2 p 

(d E(v),d E(V (f.E)))_ ( P (v,V (f“E)) 

P P P Ep P P 

^ d (f.E)(v) = d f(d E(v)). 

P Ep p 

Equation (1) holds iff E is a Riemannian submersion while (2) is 
by definition of the gradient. Finally, the end terms are equal iff 
(2.22) holds. The equivalence of the lemma follows because any vector 
in T E p X 2 can be obtained as V E pf f° r some choice of smooth f: X^ F. 

QED 


11 Proposition : Let p € A. 

(a) Identify •£ with F by the isomorphism induced by the 
o S 

projection I > I for S c: L. The projection of F 1 on which is 

O O 


orthogonal with respect to the covariance metric ( , ) (p e A) is 

P 

the conditional expectation operator ( |S), that is, the value of 
the projection of £ at i^ is: 


V 5 


IV = 


I 


CP(j)?(j)/p(i„): j c = i. 


(2.23) 
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Define d^T = p(i) - p(i s >p(i~). This measure of the failure of inde¬ 
pendence of the S loci from the S loci relates the orthogonal 

p 

projection with the projection P g of Prop. 3. For i g e I g : 

(2.24) ^T{dS?(j): j s = i s } = p(i s )[E p ( 5 |i s ) - < g) (ig)]- 

The following special cases are of interest: (1) If f G 

then E (?|S) = P^(5) = % and the left side of (2.24) is 0. (2) If 

p 1 S 

f e and | is 0 then P^(5) = 0 and the left side of (2.24) 
b b 

equals p(i g )E^(§Ii g ). (3) If the loci of S and S are indepen- 

c 

dent then d_^ = 0 for all i and the left side of (2.24) is 0. So 

in this case E (||s) = P?(5). 

p 1 S 

K • • 

(b) The linear map E : A -> A^ is a submersion with the ver- 

K 

tical subspace at p equal to T S)^. With respect to the Shahshahani 

metric the horizontal subspace is equal to T In the notation of 

OS S 

Thm. 7 (a), we define V = f{x ) : | (x ) | =0) and identify the tan- 

o 0 

gent space of A__ with the subspace V at all points. The derivative 
K K 

K K K 

of the linear map E is E itself. E restricts to an isomorphism 

— 0 k — — 1 K 

of T 7 on V . The composition (E |T 7 ) oE is the projection of 
pKK 1 p K 

O _ 

T A on T T orthogonal with respect to the Shahshahani metric. When 
p p K 

F is the isomorphism of Prop. 1.4.3 the following diagram commutes: 

P 


(F 1 ), 


0 


T A 
P 

I 

T T 

P K 


(P € A) 


Here (=^) q and (R I )g consist of the members of and p 1 respectively 


with mean equal to zero. The left and right vertical maps are pro¬ 
jections orthogonal with respect to the covariance and Shahshahani 
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metrics respectively. When p e A n the left map is (c.f. Prop. 3). 

U i\ 

If K = S then with respect to the Shahshahani metric on the 

o s o o 

simplex too, E : A A q is a Riemannian submersion. The tangent 

map at p is given by the formula (for X = £ p(i)£(i)d^): 

(2.25) E S (X) = ^{p(i s )K p (f|i s )B is = i s € I g }. 

0 

(c) If f: A ■> F is a smooth map then the orthogonal projec¬ 
tion of the gradient 7 f to T ?„ is the same as the gradient of the 

P P K 

restriction of f to the leaf of through p, f| (7^)^. For the 
latter gradient the restriction of the Shahshahani metric is used. 


Proof : We begin with (b) . Any onto linear map is a submersion and 

the identification of the vertical and horizontal subspaces follow 

from Thm. 7 (c) . (E K |T J) ^»E K maps T A onto T and is the iden- 

1 p K P p K 

tity on T J . So it is a projection on T J . Since its kernel is 
p K p K 

the vertical subspace which is orthogonal to T it is the ortho¬ 
gonal projection. Prop. 1.4.3 says that F^ is an isometry and it 

clearly maps (=/LJ n onto T T . From this the commutative diagram 
u p K 

follows. 

When K = S, E (d.) = b. for all i e I and so (2.25) follows 

1 x s 

from the definition of (which is (2.23)) and linearity. X is a 
horizontal vector iff f e mean i n 9 5 depends only in i g and 

so E p (i|i s ) = S(i). so if 5,1) e (^ s > 0 with p e A, 

p (5,T|) = y p (i)5 (i)T1(i) = ^p(i s )? p (?|i s )5 p (Tl|i s ). 


S _ o 

Via the isometry F this says that E is an isometry of T JT on T i , 
p p S p S 

S 

So E is a Riemannian submersion. 
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For (a) * That E^(f|s) is the orthogonal projection on =£ s is a 

classical result about conditional expectation. But it also follows 

from the diagram in (b) by the equation (2.25). Equation (2.24) and 

the special cases are easy direct computations. 

(c) is a special case of a general result about the gradients 

of restrictions to submanifolds. If g is the restriction to 

(j) then the gradient is characterized by: (1) 7 g e T J and 
K p ^ p P K 

(2) (7 g,X) = d g (X) = d f (X) for all X e T jr The orthogonal 

P P P P P K 

projection of 7 f satisfies these conditions. QED 

12 Corollary : Let p e A. If 5 e then g is completely determined 

by the set of conditional expectations (E (5J i ) : s e K) . 

P S 


Proof ; By subtracting 5 = E(?) if necessary we can assume that the 


mean is zero. So X = E p(i)|(i)d^ lies in the tangent space of A at 

p. E S (X) = X S = £ p (i ) E (J*|i )d. . If § e 4 then X e T i and so 

S p 1 S l is. p K 

S ^ k — — 1 

is the image of (X : S e K) under the isomorphism (E | T^^) . QED 


We can now see that if p e A - A the orthogonal projection of 
F 1 onto need not satisfy (2.11a). 

For example^ suppose f e and consider the orthogonal pro¬ 
jection of 5 to (2.11a) would say that the image lies in 

^SflS = ^(0) se ^ constant functions. If 5 began 

with mean 0 then (2.11a) would imply that the image has to be zero. 
But unless i and i~ are independent with respect to p equation 
(2.24) and special case (2) show that the projection need not be zero. 

We now prove a result which is useful in computing K-type 
epistasis approximations to the selection field. 


First, another lemma about projections: 
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13 Lemma ; Let V be a Euclidean vector space with metric ( * ). 

(a) If T: V -> V is an isometric isomorphism and is a T 

invariant subspace, then the orthogonal projection, P, of V on v c 
commutes with T. 

(b) Let => V 2 be subspaces of V. If P : V -> V^, 

P 2 : V -> V 2 and P^: V l ^ V 2 are 0rt ^ 10 9 0na ^- projections, then 

P 2 = P 2l‘ P l* 


Proof : (a) : Let v € V and u e V^. By invariance, TPv and T ^"u 


lie in V . So, because T is isometric and P is orthogonal: 


(Tv - TPv,u) = (T(v - Pv),TT ^u) 


= (v - Pv, T 1 u) = 0 


So TPv e Vq with Tv - TPv perpendicular to V^. Thus, TPv is the 
orthogonal projection of Tv on V Q , i.e. TPv = PTv. 

(b): p 2 i*Pi( v ) € V 2 for v € V. v - p 21 ° p 1 ( v ) = ( v " p x (v)) 

+ (P^Cv) - P 2 i 0 Pi(v)) and so is perpendicular to V . Thus, 

P 21* P 1 (v) t ^ le ort ^ 1 °9 ona i projection of v on , i.e. 


P 21° P 1 (V) = P 2 (V) ‘ 


QED 


14 Proposition: (a) If p_^_. is a distribution on I x I which is com¬ 

pletely symmetric as a function of i and j and K is a complex 

ixi 

on L, then the , ) orthogonal projection of R onto =£ (c. f. 

P 2K 

(2.21)) commutes with T^ for all Q c L. In particular, such a pro¬ 
jection preserves complete symmetry. If p^ = p_^p_. for some distri¬ 
bution p on I, then p is completely symmetric iff all loci are 
independent, i.e. iff p e A. In that case all of the loci of I x I 
are independent with respect to p. 
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When K = S, the projection to s£ = s t preserves complete 

o SXb 

symmetry when the S and S loci are independent, i.e. 
d? = p(i) - p(i g )p(i~) = 0 for all i. 

(b) There is a natural identification isomorphism between 


^Lx[0)’ consisting of functions of ij depending only on i, and K . 
The identification maps isomorphically onto If p^ = P^Pj 

then the identification is an isometry of -( , ), with ^( , ). 

(c) Let p^ = P^Pj and let K be a complex on L. The 
following diagram commutes with the arrows all orthogonal projections 
and the isomorphisms the identification map of (b): 


\|/ 

* 

Proof : If p is completely symmetric then it is easy to check that 

Tq is an isometry of -( , ). In fact, it is the map induced by 

a measure preserving bijection of I x I. Since T (=£ c ) = *£- - where 

Q bX1 bX1 

S x T = Tq(S X T) , it is clear that T Q i nvar i an t* That the 

projection commutes with T^ and so preserves complete symmetry follows 
from Lemma 13 (a) . 

Clearly, if p e A and p^ = p^p^ t ^ len a H °f l° c i are P 

independent and p is completely symmetric. For the converse, sum 

the equation p(i)p(j) = p(i Q j~)p(i^j Q ) on j to get p(i) = p(i Q )p(i~) 

or d? = 0. That this holds for all i and Q implies p e A by the 

proofs of Prop. 1.5.1 and its Corollary. 

Ixl 

Finally, if m e F it is easy to check that 


— ixl 

R - >=£ lx{0} 


t 


'2K 


yy 

si Kx(j2f} 
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1 C fb k Jb 

Vsxs (m)(ij) = m(i snQ j snQ k s’ j snQ i snQ A s ) = p sxs T Q us (m) (ij) 


Averaging kj£ by the distribution p we get for all p: 

K Sb 


(2.26) 


T P P 
Q SXS 


= P P _T_ 


SXS QUS 


__ 5 SxS 

Now if p.. = p.p. and d. =0 for all i then d.. =0 for all ij in 
iD ID i ID 

I x I and so by Prop. 11 (a) Pg xS is the orthogonal projection. So 
that projection preserves complete symmetry. 

(b) : (I x is j ust I and it 16 identification is a 

special case of the identification between ^ and R^ by the projec¬ 
tion I Ig. Preservation of i s clear and the isometry result 

is an easy computation. 

(c) : The square on the left commutes because both compositions 

are the orthogonal projection to by Lemma 13 (b) . The square 

on the right commutes because the identification in (b) is an isometry. 

QED 

IXi 

Remark: By Prop. 11 (a) if m € F and P is the -( , ) projec- 

p 

tion on =£_ followed by the identification with F : 

*-Lx{0] 

p(m) i = EpH 1 ) = 3 e *)• 


So if p. . = p.p. the projection to R 1 is m. = T, p.m. .. The diagram 
*ij i D l l lj 

of Prop. 11 (c) then says that we can get the projection of m^ to ^ 

by first projecting m. . to s£ to get n. . e and then averaging 

ID 2K ID 2K 

n ij n i G a ^k* If is completely symmetric and p e A then 

part (a) implies that n^ is completely symmetric. 

In applying these results to K type epistasis for a particular 
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complex K we need a basis or at least a spanning set for the ( , ) 

perpendicular complement B of the subspace By Thm. 7 (c) the 

set (b^j: i e i} and any fixed choice of j is such a spanning set 

More generally, Prop. 3 and Lemma 6 imply that for any fixed p g A 

(2. p.b. i g i} spans B . However, special examples of complexes 
J 3 1 D K 

K may have special spanning sets associated with them. We now 
construct such a set for the case where K is the s skeleton 

( s) 

L of L (cf. Sec. 2 of Chap. I): 

(s) 

(2.27) L = [S c= L: dim S <; s}. 


Recall from Def. 5 that the dimension of S is the cardinality of 
S minus one. Thus, for example, consists of all of the 

singleton sets [a] of L together with 0, If K = L we will write 

=£ (S) for 3^. 

(s) 

For L we get a useful spanning set by beginning, as in the 
general case, with a linear operator. 

For every order preserving map X: L (0,...,s+l), and every 
map e: (0,...,s+l) -> (0,1} and i,j e I define: (i,j,X,e) e I by 


(2.28) 


(i,j,X, £ ) a 


( i a s oX(a) = 0 
„ 3 a e.X(a) = 1. 


The map X partitions L is s + 2 disjoint subsets namely 
X ^(q) for q = 0,...,s+l. e«X then labels each subset with either a 
0 oral. (i,j,X,e) is the member of I agreeing with i on the 
subsets labelled with a 0 and with j on the subsets labelled with 
So (i,j,X,e) = igj- for S = (e«X) 1 (0) c L. 

Now define |e| = e(0)+...+ e(s+1) = the number of times 1 is 


1 . 
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hit by e, and for | e R define 
(2.29) D i,j (?) = y (-1)(i,j.x.c) 

s-f 2 

where the sum is taken over the 2 maps e. 

X * 

As in Lemma 6, if X. = p.§. and L. .(p) is defined for p e A by 

l ii i ) 3 


(2.30) 




(-D 11 An p,. . . . 


and we have 


(2.31) 


(7 h l ,X) = (|), 

P ) D P ) D 


To relate D_^ ^ to the previous operators, define 
S(X, q) = X 1 (q) = L - X 1 (q) and note that 

\ D S(», q >> <5)(1 >- 

q=l 

Here the proof is just a matter of expanding the product out using 
Dg = 1 - Pg and the equation (cf. (2.10b)): 

'(l.J.X,.) - J (P SU,q) ! *<«> ' 


Thus, j(§) = 0 for all i,j and X iff for every X, § is 
the sum of terms each concentrated on some S(X,q) i.e. not depen¬ 
dent on the variables in X ^(q) for some q. Now if | depends on 
some bloc of s + 1 variables X applied to these variables misses 
some q e {0,...,s+l}, i.e. the bloc is contained in some S(X,q). 

( o \ \ 

Hence, if § e sd then D. . (§) =0 for all i, j and X. On the other 

] 

hand if the carrier of | has dimension greater than S and so 
contains some set S with dim S s + 1 and upon which § depends 

then choosing X to that X maps S onto (0,...,s+l), S £ S(X,q) 
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for any q, and so by definition of the carrier of f, § can't lie 

X (s) 

in where K = V S(X,q). So some D. .(f) is not 0. Thus, ■£ is 

k q i, d 

the intersection of the kernels of D^. This proves the following: 

15 Lemma: Define b^. e R 1 so that with b = b^ . , . = L^. That is, 

- ID ID id 

let (b^.h be the coefficient of in p, in (2.30). fb^. : i,j € I and 
ID k k ij 

(s) 

X: L -» {0,...,s+l)) is a spanning set for B 


Remarks : 1. The above proof shows that we need only include surjec¬ 

tive maps X to get a spanning set. If X is surjective and i,j 

s+2 

are fixed in I then the 2 expressions (i,j,X,e) are all distinct. 


So from (2.30) the coefficient of in p, in LV.(p) is either +1 or 0. 

K ID 

2. In the other direction, we did not need that X was order¬ 
preserving to define or to show b^_. e B^. The order-preserving 

condition merely reduces the size of the spanning set. 



Ill, Selection, Recombination and Mutation 


1. Selection and Epistasis . 

The general model of frequency dependent selection is a vector- 

field X = E X.d. = £ p.C.d. on A where the components X. and £. are 
1 i ii i i i 

00 

C functions of the state p e A. We think of f^(p) as the relative 
fitness of gametic genotype i when the population distribution is 
at p. The fitness is relative because § is normalized to mean 
zero, i.e. £ P^§^(p) = 2 X_^ (p) = 0. Restricting to the interior 

o 

distributions A, we interpret the results of Chap. II to describe at 
most K-type epistasis or K- type epistasis , for short, when K is a 
complex in the set of loci L. 

Thm. II.2.8 gives a list of equivalent conditions which define 

what we mean by K-type epistasis. It means (by part (c) of the Thm.) 

00 S ° X s 

that there exist C functions cp : A -> F for S e K such that 


( 1 . 1 ) 


? i (P> = 


s , ^ 
^ (P) 

S 


So for each fixed p e A, §^(p) regarded as a function of i is a 
sum of terms each depending only on the alleles in some bloc S of 
loci in K, i.e. g (p) e s Note that, in general, the coordinate 

S 

function cp. depend on the entire distribution p and not just on the 

s S 

partial distribution p induced on I g , i.e. they do not factor through 


We can test for K-type epistasis using part (d): X has K- 
type epistasis if £ h^§^(p) = 0 for all p e A whenever b is a vec¬ 
tor of the ( , ) orthogonal complement of a^. In applying this 

test it suffices to check it for b in some spanning set for B K . 
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Geometrically, K-type epistasis means that the leaves of the 

transverse foliation J are invariant with respect to solutions of 

K 

the differential equation associated to X. Since these leaves are 
defined by the equations [L^ = constant} this is a conservation prin¬ 
ciple. The leaves act like energy levels with energy conserved by 

selection. To be precise, for b € B , the function 

K 

b b. 

(1.2) L (p) = in n ( P;L ) 1 

i 

remains constant as p changes under the influence of a selection 
field exhibiting K-type epistasis. Alternatively, the gene ratios 
which are the antilogs of the functions I ? 3 remain constant for b e B 

K 

iff the selection field satisfies K-type epistasis. 

If X is the gradient vectorfield of some fitness function 
6 _ 

f: A -» F, i. e. X(p) = 7 f, then by part (g) X satisfies K-type 

P 

K 

epistasis iff f factors through the map E . This means that the 

5 

value of f at p depends only on induced distributions p for 

S e K. In contrast to the general case above, for a gradient field 

S K 

the functions cp do factor through E . 

1 Proposition: Let X be a gradient vectorfield satisfying K-type 

epistasis. Each component regarded as a function of p factors 
K S 

through E . Furthermore, the functions cp for S e K can be chosen 
to factor through E . 

Proof : Recall from equation I. (4.12) that if X = vf then for any 

O Q 

smooth extension of f: A R to a function on P, = df/dx_^ - 
£ p_. df/dXj. in particular we can use the extension defined by: 
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(1.3) f(x) = f(x/|x|) x e P. 

# o 

This is the unique extension of f to a function on P which is 
homogeneous of degree zero. By Euler's Thm. 2 x^ df/dx_. = 0 for such 
a function (see Shahshahani [28, p. 2]) and so we have 


(1.4) l i (p) =^~| p for a11 P e A. 

Now for b e B t ^ we know that 0 = £ b.§. = £ b. (dVdx.) at every 
K ii l f i JL 

point p of A. By homogeneity this equation holds for the exten¬ 
sion of f at all points of P. So we can differentiate again to 
get: 

v— a 2 f v— 9S-1 

(1.5) 0 = y b. t r— = y b. “— L for all b e B . 

/ i dx^x. i dx. K 

i 1 ^ i 1 

Applying Thm. II. 2.8 to §. with j fixed we have that 7 § . e a£ and 

3 P 3 K 

so 5, factors through E . Now the inductive construction of the 

g 

functions cp in the proof of Thm. II. 2.8 yield functions all factor- 
K 

ing through E . QED 


Remark ; That X was gradient was really used in (1.5) which needs 
d|^/dx_. = d§j/dx^ for all i and j. 


Our selection field is a gradient with f = ~ m. m is mean 
fitness defined by the symmetric matrix m__ of fitness constants. 

Now Thm. II.2.9 applies to describe K type epistasis of the selec¬ 
tion field. Furthermore, in the completely symmetric case the decom¬ 
position of the gametic fitness function into a sum of K-blocs extends 

to the zygotic fitness numbers (cf. (e )): 

sym 
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V s ,- • x S 

“ij = 2- m (l s’ ;, s ) m e R 


To get m_^ we multiply by p and sum on j. Apply this to the 

above equation and reverse the order of summation. Then for each 

fixed S sum on the complementary S, loci first. Applying equation 

S S 

II. (2.15) and letting p denote E (p) , we get: 


i-r 


m S (ig) , 


where 


m S ( i s ) 


P S (jg)m S (ig> jg) • 


Similarly: 


m = / m 


S . . . S . . \ S . . ■ \ 

p d s )P (Dg)"* <VV‘ 


V 3 S 


We now turn to some examples: 


Zero Epistasis (K = L ): This is the case described in 

( 0 ) 

detail in Sec. 5 of Chap. I. The projection E (= E we omit 

a o 

the superscript for this case alone) is the product n^E mapping A 
onto n a A a . § e R 1 lies in iff there exist y* e R a (a = 1, . . . , i) 


for i € I. 


The orthogonal complement is spanned by the coefficient 
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vectors of defined by I. (5.2). To see this apply Lemma II.2.15 

and the remarks thereafter. Here s = 0 and so X maps L to (0,1}. 
Letting S = X "^(0) the four elements of I of the form (i,j,X,e) 
corresponding to the four maps e: ( 0 , 1 } -> ( 0 , 1 } are just i,j,i = i j~, 

s s 

— x s 

and j = j i~. So L.. = L. . in this case. Choosing a basis from 
J J S S ij 13 ^ 

• d 

among these maps we define L: A -> F . In particular, the leaf of 

-1 . 

maximum entropy L (0} is the Wright manifold A in this case. 

Lemma II.2.15 shows that to get a spanning set we need only 
consider order preserving maps X. Such a map is determined by a 
choice of p, e L so that X ^(0) = S = (a a p,}. We adopt 
Shahshahani' s notation and let (i:p, : j ) be the element of I whose 

S 

allele at the a loci is i if a < la and is j if a > u,. Then L. . is: 

a ^ p J a p 13 


( 1 . 10 ) 


L^. = An[p.p./p ,.p., . . ] 

1 D (irp, :D) (D :p. :i) 


= An p. - An p . . . - An p / . . . + An p. . 

1 (i:h:d) (D:h:i) 3 

Since the coefficient vectors of the L*^^ ' s span B^°^, § e 
if for all i,j e I and p, e L: 


( 1 - 11 ) 


’ i ? (i:u,:j) 5 (j :p, : i) + ? j °’ 


The mathematical fact that the L*^ . ' s span the same space as 

S 

the larger set of L^'s is essentially the same as the biological fact 
that an exchange of genetic material between the chromosomes at 
exactly the loci in S for any subset S of L can be obtained by 
a sufficiently intricate (and so sufficiently unlikely) sequence of 
single cross overs. 


If m_^j is completely symmetric then Thm. II. 2.9 says that the 
following conditions are equivalent and define zero epistasis for the 
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selection field v (“ m): 

(a) Mean fitness m factors through E i.e. depends only on 
the individual gene frequencies p(i^). 

(b) For all jj, , i and j, is an integral of the selec¬ 
tion vectorfield 7 m, i.e. the ratios p.p./p,. , v p ( , .. remain 

2 1*1 sj ) (D sp. 

the same as the genotype distribution changes only according to 
selection. 

« _ 1 - 

(c) The leaf A is left invariant by selection, i.e. ^7 — m is 

0 e 

tangent to A at points of A. 

I xl 

(d) There exist symmetric functions m e R such that 


( 1 . 12 ) 


m. . = \" m a . 

13 2 . V 


a a 


Some remark about (c) is in order. The other conditions imply 
that 7 m is everywhere tangent to the transverse foliation T. This 
implies (c) . Conversely, (c) means that m_^ (p) as a function of i 
lies in for p e a. Let p approach the distribution concen¬ 

trated at j through distributions in A* rru (p) approaches m_^ and 

so m. . lies in as a function of i for all j. This is condi- 

il 

tion (d) of Thm. II.2.9. 

As (1.12) is a special case of (1.6) we get the corresponding 
special cases of (1.7) and (1.8): 


(1.13) 


m. = 
1 


a a 

p. m. . . 

1 il 
J a a a 


(1.14) 


y a \ a a a 

m = A P i P 3 m i j 

-- ^- rv a 


a a a a 


It is important to understand what these equations do not say. 
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m. need not be the average fitness of a gamete carrying allele i at 
a 

the a locus. With p e A this average fitness is the conditional 
expectation of the random variable m^ assuming that the allele at the 
a locus is i^. This expression is computed in Prop. II.2.11 (a): 


(1.15) m(i a ) = E p^ m l i a ^ = m i + 

" tt 


a 

= m. + 
i 

a 


mP + 


i 


(3 




where d. ,p = p(i i_) - p(i )p(i_). So m(i ) depends on the joint 
1 Ct p Cfc p Cfc 

distribution of i and i Q . Thus, even after normalizing to mean zero, 
ct p _ 


m(i ) - m differs from m. 
a i 


m by the third term in the sum which 

e 

is due to linkage disequilibrium. Now if p e A the loci are indepen- 

0 

dent and so this last term does vanish. Furthermore on A we can 
describe the selection field quite simply: 


• o 

2 Proposition : E: A -> II A *- s a diffeomorphism. If we put the 
Shahshahani metric on A and the product of the Shahshahani metrics 

o 

of A^ on the product n, then E is an isometry of Riemannian mani¬ 
folds. 

- 1 - 

If m is given by (1.12) then v “ m is tangent to A and maps 

O — 1 ct 

under E to the vectorfield which on the factor A Q is 7 " m where 
the latter gradient is with respect to the Shahshahani metric on A^. 

Proof : The isometry result is an easy computation using Prop. II. 

2 . 11 (b) and the orthogonality of vectors depending on i with vectors 

Cfc 

o 

depending on i^ when a ^ (3 and p e A. The isometry result implies the 
gradient results which also follow by direct computation using Prop. 
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11.2.11(a) and (1.15) 


QED 


This means that if there is zero epistasis the selection field 

o 

on A is just the sum of the separate effects of the different loci 
with each depending only on the gene frequencies at that locus. Thus, 
there is here no problem of "genetic background". 

One-dimensional Epistasis (K = L^):Again, apply Lemma II. 
2.15. In this case X is an order-preserving map onto (0,1,2) and 
so corresponds to picking two positions 1 <[ p. < v < X. L^ . becomes 


(1.16) 




P i P l P (i;|j. :i;v:i) P (j:u;i;v:i) 
P (i:n:j) P (j:u,:i) P (i:v:j) P (j:v:i) 


Thus, there is one-dimensional epistasis iff these ratios are pre¬ 
served for all p, < v, i and j. One dimensional epistasis is common, 
implicitly, if not in the world at least in a lot of genetic models 
because of the following: 

3 Proposition : Let m_^ be given by (1.12) and let F: F -> F be some 

2 

quadratic function, i.e. F(t) = at + bt + c. If n. . = F(m. .) then 
M ID ij 

n_^j is completely symmetric and exhibits at most one-dimensional 
epistasis. 

a 8 

Proof : Squaring out the sum one gets cross terms like 2am. . m. 

Va 1 p :l p 

which depends on two loci. So n_^ is the sum of terms depending on 
at most two loci each. The result follows from Thm. II.2.9. Complete 
symmetry is obvious. QED 


This quadratic model arises, for example, if we assume that the 


genes act additively to yield some metric trait measured by m_^ and 
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hat fitness n.. is given by: 
ID 


1. 17) 


n. . = c n - c (m. . - ml' 
i.j 1 2 ij 0 


here is some optimal value of the trait. 

Adjacent Locus Interaction (K = [1,2} V [2,3} V...V 
n this case we regard the set L of loci as lying in order along a 
ingle chromosome. The only interactions are pairwise and of these 
nly adjacent loci interact. So K is contained in but is 

uch more restricted. 


Proposition : For K = [1,2} V...V [X-l, i) , a spanning set for B is 

K 

iven by the coefficient vectors of two families of function .: 

iD 

L ij V: ^ ^ V and i’’-* e ^ ( cf * and ^ L ij : e 1 such that 

= j } (cf. (1.10)). 

JL |JL 

roof : By the previous example, (b,£) = 0 for all b = b*^ V iff 5 

as only pairwise interactions, i.e. iff for all pairs [a,3} and a < (3 
ctB I a XI o 

here exists cp € R such that 


1 . 18 ) g(i) = y~ cp aP (i a ,i p ). 

a,3 

ow if [a, 3 } is an adjacent pair = [a,a + 1 } it is easy to check that 
2(3 

satisfies (1.11) whenever i,j e I with i = j . The only non- 

rivial case is where a = u, and then i f * = (j:u,:i) f Q and 

lct,pj [ct,pj 

c = (i:u.:j) r 0 -) because i = j . This shows that all of the 

[cx,pj [ot,pj Mi 

oefficient vectors are ( , ) orthogonal to ^ and so lie in B^.. On 
he other hand, if 5 is orthogonal to all of these vectors it 
atisfies (1.18) and there cannot exist a pair [a, 3 } in the carrier 
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which is non-adjacent for then there would exist a p, € L with 

a < p, < p and the coefficient vector of some L^_. would not be ortho- 
a(3 a6 

gonal to cp . Since cp only uses the values of 1 at the a and 


(3 loci we can choose the i and j to be equal at the p, locus. 


So a selection field satisfying only adjacent locus epistasis 
preserves the second order recombination ratios of (1.16) and also 
certain of the first order recombination ratios of (1.10)—but not 
all of them. 

The adjacent locus case is also one of the few cases other 

than the disjoint bloc generalization of zero epistasis where we can 

explicitly compute A . In fact suppose that + is an interior 

K 

distribution on I x I _ and the family (p^ * + : p, = l,...,j£-l) 

p. p. + l 

is compatible in the sense that + an d p^ ^ induce the 

same distribution p^ 1 on I . Then define p e A by: 


(1.19) p ± = p(i r . . . , 1 ^ 

(1,2) „C2,3) / . . 

P ^l’ 3 ^ ' P ' L 2’ 3 '3 


rx-i,x} . , 
P ( V l’V 


2 3 .. . I -1 .. . 

p U 2 ).p (r 3 ).....p U A _ 1 ) 


p^ > 0 for all i e I and E^ |J,; ’^ +1 ^(p) = summing in order 

up the loci from 1 to p, - 1 and down from & to ^ +2, At each sum 

step only one factor in the numerator is affected and the resulting 

sum cancels a factor in the denominator. p € A by Thm. II.1.6 

K 

because in p_^ is clearly a member of a^. Notice that in this case 
the anamoly about the image of E remarked upon after Thm. II.2.7 


does not occur. 
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Disjoint Bloc Model (K = T V. . .V T with {T : a = 1, ... , j l'} 

Li a 

a disjoint family of subsets of L with union L): In this case 
% e if it is a sum of terms each depending on the loci in one of 
the separate blocs. 


5 Proposition: For the disjoint bloc case K = T_ v...V’T . the orth- 
-- Li 

ogonal complement B is spanned by the coefficient vectors of the 
K 

g 

family {L^: ijj e I and S is a union of some of the T^'s), i.e. 

S D T = T or 0. 
a a 


Proof : The special subsets S that we are looking at are precisely 

those for which S-reconibinations do not break up blocs of alleles in 
S X 

K. The j ' s are the ^ ' s corresponding to those maps X: L -» {0,1) 
which are constant on blocs of K. The proof is then an easy analogue 
of the proof of Lemma II.2.15. The details are left to the reader. 

QED 


Furthermore, 


In this case E maps A onto the product n A 

T a a 

A can be explicitly computed. Let p be a distribution on I for 
K T 

a 

a = Since the T^'s are disjoint there are no compatibility 


conditions necessary and we define p e A: 


( 1 . 20 ) 


T T , 

p = p(i) = p (i ) • . . . -p 1 (i ) . 

1 V 


p is in A because in p. e (see Thm. II.2.7). 

K l K 

The analogue of Prop. 2 holds with the same proof: 

K o 

6 Proposition : With respect to the Shahshahani metrics, E : K -> 

K 

II & A T is an isometric dif feomorphism. If . is completely symmetric 

a K ^ - 1 - 

and satisfies K type epistasis then E maps ^ m) on h to the 

2 K 
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i T 

0 - 1 a 

field which on the factor A^ is 7 (~ m ) . 

a 

Neutral Loci (K = S L) : !• G =^ s means that § does not 
depend on the value of the alleles at the loci of the complement S 
(cf. II. (2.1)). By Prop. II. 2.2, is the kernel of D^. So by 
II. (2.5) a spanning set for B g is given to the coefficient vectors od 
the family of functions: 


(1.21) In Pj/Pj = <n p. - tn p. i,j e I with i g = j g . 


S type epistasis means that the alleles of the S loci are 
neutral, i.e. the fitness numbers do not depend upon them. If a 

o 

vectorfield on A has S type epistasis then by (1.21) the ratios 
p. . /p. . remain at some constant value C. Rewrite this as 

Vs 


p. . = C p. . 

Vs Vs 

s s 

and sum over the i loci. We get that C = p (i~)/p (j~) . So for an} 

s s s 

pair i~, j~ in I~ the ratios 


( 1 . 22 ) 


p S (i'g)/p S (j^) 


remain constant under selection satisfying S type epistasis. 

One reason for considering these epistasis questions is the 
hope of reducing the dimension of the problem. The dimension of A 

o 

is n-1 which might be quite large. On the other hand A , the image 

K 

• K 

of A under E , might be much smaller. If X = £ -^i^i^i ** _s a 9 enera - 
frequency dependent vectorfield then the image of X at p e A under 

S 

the linear map E is computed in Prop. II.2.11. 


It is: 
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(1.23) 


x s = 




S o s s 

X is a vector in the tangent space T (A ) where p = E (p) is the 

s s 

p 


distribution induced from p on the subproduct I by the projection 

g 

from I and g is the conditional expectation E(g|s). Just as g^ is 

S 

the fitness of i normalized to mean zero, we call g. the fitness 

1 s 

of ig. It is the average of ^ over all of the gametic genotypes 

which carry the alleles of i g at the loci of S. In the epistasis 

zero case with S = (a}, it is mCi^) - m computed by equation (1.15). 

So as p flows in A along the vectorfield X, p S flows 
° S 

in A g tangent to X . However, as (1.15) illustrates, there is an 

2 

important difference. X depends, in general, on the distribution p 

Q 

and not merely on its image p . Geneticists refer to this as the 
effect of "genetic background". It has a rather unpleasant effect: 
even if we completely understand the genetic background, i.e. we 

2 S 

know X as a function of p, we can't integrate X to get a flow on 
° S o 

A because X is not a vectorfield on A^. Here K-type epistasis is 

helpful in two ways. First, Cor. II.2.12 means that with K-type 

epistasis we can recover the full vectorfield X from the averaged 

data (X : S e K). Second, if selection alone is acting then in the 

presence of K-type epistasis each leaf of T (e.g. A , the leaf of 

K K 

maximum entropy) is invariant under the flow. Furthermore, Thm. II. 

K 

2.7 implies that E maps such a leaf diffeomorphically onto A__. So 

K 

o 

in this case we do just flow along or at least a closely related 

K 

diffeomorphic image. 

If X does not exhibit K-type epistasis we can project ortho- 

ogonally (rel. ( , ) ) to T J to get X = Z p(i)7](i)d.. In general, 
p p K K l 
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the arguments of Prop. II.2.11 and Cor. II.2.12 show that it is X 

K 

g 

that we can always recover from [X : S € K}. Also by the commutative 
diagram in Prop. II. 2. 11 (b) , T] is the ^ orthogonal projection of 
f. This means that the random variable T|^ is the ^ approximation 
to with least mean square error, i. e. the error - T]^ has mini¬ 
mum variance. Furthermore, the variance of the error is just the 

square of the length of X - X at p with respect to the Shahshahani 

K 

metric. In the case where X = 7 (“ m) then Prop. II. 2. 14 and the 
Remark following imply that X^ at p agrees with 7 ^ (-“ n) where n_^ 
is the approximation of with respect to - ( , ) (P^j = P^Pj) • 

However, the approximation n^ will vary from point to point. Also* 
unless p lies in the Wright manifold, the projection from m to 
n might destroy complete symmetry. 

There is an important reason for attempting this sort of reduc¬ 
tion in the disjoint bloc case—important beyond the mere convenience 
of dealing with a small manifold. 

As we remarked at the end of Sec. 1.1 the vectorfield model is 
a medium sized model adapted to handling only a small portion of the 
genome. Our justification for considering it is the hope that we can 
isolate a medium sized bloc of loci which may interact with one ano¬ 
ther in determining a component of fitness but which are isolated in 
their effects from the rest of the genome. In short, we hope that 
the model we are looking at is a factor of some larger unseen dis¬ 
joint bloc model. To be precise, suppose that K = T V T with T c L 

and what we see are the images of selection, recombination and 

T 

mutation under the map E : A A . We call T the set of observed 
loci and T the set of hidden loci . From this viewpoint it is 
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important to understand not only the dynamics of the vectorfield 
model on A but also how much of the image of these dynamics on A T 
can be described by the projection of the fields on This is 

another form of the problem of genetic background. 


7 Proposition ; Put the Shahshahani metric on A T as well as A so 

T« • 

that E : A ■» A T is a Riemannian submersion. Let m be a completely 

I xl ~ 

symmetric member of R exhibiting T V T epistasis so that 

~ I XT ~ l~xl~ 

ip ip ip T T T T T 

m = m + m where m € F and m € R 

o 

The leaf A TV ~ consists of those distributions p in A with 

~ T 

respect to which the T and T loci are independent, i.e. d^ = 0 

T - 1 - - 1 T 

for all i. On A TV ~ E maps the selection field 7 — m to 7 “ m on 

0 T - l - -IT 

h . In general, at p, E maps 7 “ m to 7 “ m + e (p) where the error 


term is: 


(1.24) e (p) = [ d f mT : i e I}. 

4- 1 1 T X T 

Here d^ = p(i) - p(i T )p(i~) and = 2(p(j~)m T (i~,j~): j~ e I~). 

T ~ 

T 

If the selection field exhibits T type epistasis, i.e. m = 0 

T 

or a constant which can be absorbed in m , or equivalently if m de¬ 
pends only on the distributions of the alleles in the T loci, then 
- 1 - T - 1 T 

7 “ m projects under E to 7 — m everywhere on A. 

Proof : The first two paragraphs are excerpts of Thm.II.2.9 and Prop. 

II.2.11. In particular, (1.24) follows from II. (2.24) and II. (2.25). 

The S-epistasis case follows from (1.24) directly or from Lemma II. 

T 

2.10 applied to the Riemannian submersion E . 


QED 
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2. Recombination and Entropy . 


Throughout this section and the next we will assume that the 

3 

birth rates b.. and recombination rates r.. are nonnegative and com- 
iD il 

5 

pletely symmetric. So for S c L, the recombination field R is 
defined by I. (7.5): 


( 2 . 1 ) 


■Z 


1*1 


= (1/4) 

i* j 


il il il i 


S _ ,S - S 
r..b..d..vL.. 
il il il il 


S S 

where the functions d.. and L.. are defined by I. (5.1) and I. (5.2), 

il il 

The recombination field is -R where (cf. I. (7.2)): 


,2.2) R-£(R S , SCL). 

3 

1 Theorem : The vectorfields R and R satisfy the following equiva¬ 
lent conditions: 

(a) They are tangent to the fibres of the projection 

e o 

E: A -> II^A^ mapping a distribution to the set of marginal distribu¬ 
tions . 

(b) They are tangent to the fibre-foliation © associated 

(0) 

with the zero-epistasis complex L 

(c) E maps these vectorfields to zero. 

a 

(d) The gene frequencies p. are integrals of the motion of 

^■a 

these vectorfields, i.e. they don't change as p flows according to 
these vectorfields. 


_ g 

Proof : These conditions are all satisfied by and are preserved 


by linear combination. So the result follows from (2.1). 


QEI 


The key to our understanding of recombination is the fact thal 
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S S 

cL j and L^ are measuring the same thing in slightly different ways. 

S S 

In particular, they have the same signs. If R had L^ instead of 

g 

d^j it its formula then R would be a gradient vectorfield because 
S — S 1 — S 2 

L..7 L.. = — v(L..) . We are led to introduce a convenient function, 
il il 2 13 


oo 

2 Lemma : Let T be an open interval in R and f: T -» R be a C 
function. Define the difference quotient A^: T x T -> R by: 


A f (s,t) 


f(s) - f(t) 
s - t 

f 1 (t) 


s ^ t 

s = t 


A f is C°° and if s / t then A f (s J ,t) = f'(0) for some 0 strictly 
between s and t. 


Proof ; Changing variables to (t,h) with s = t + h Af(t,h) = 
h ^[f(t + h) - f(t)] ifh ^ 0. So by the integral form of the remain¬ 
der in Taylor's Theorem 

r 1 

A f (t,h) = \ f' (t + hu)du 

00 

and the integral on the right is C even when h = 0. 

The last result is the Mean Value Theorem. QED 

2 

3 Lemma ; Let 0 be the first quadrant of the plant R , i.e. 

0 = {(s,t): s,t ^ 0} and let O be the interior where the inequali¬ 
ties are strict. On O we define 
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Q(s,t) 


in s - in t 
t 

0 


s ^ t and s.t > 0 
s = t > 0 

s -1 = 0 


Q satisfies the following properties. 

(a) Q is continuous on 0 and is infinitely differentiable 

on 6. 

(b) Q is non-negative with 6 = (Q > 0}. 

(c) Q(s,t) is between s and t, strictly between if s / t 
and s.t > 0. 

(d) Q is homogeneous of degree 1, i.e. Q(Xs,Xt) = XQ(s,t) 
for X ;> 0. 

(e) Q is symmetric, i.e. Q(s,t) = Q(t,s). 

(f) ”^(s,t) =“^(t,s) are positive and homogeneous of degree 

o s o t 

0 on 6. 


Proof : Q is the reciprocal of the difference quotient for the loga¬ 

rithm in defined on the positive reals. Since the derivative of 
in t is 1/t (a), (b) and (c) on O follow from Lemma 1. (d) and (e) 

are obvious. 

If s approaches 0 while t remains bounded away from 0 
and oo 3 Q (s, t) approaches 0 like s /in s. By symmetry, Q goes to 
0 if t approaches 0 alone. If both s and t approach 0 then 
they take Q(s,t), which lies between them, along. 

Finally, for (f): 

5 Q _ (t/s) - In(t/s) - 1 
SS Un (t/s)) 2 


(s ^ t). 
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This is positive because r - 1 > in r for all positive r ^ 1. When 

bQ 1 

s = t, two applications of L'Hopital's Rule imply that — = “. QED 

g 

4 Proposition : On A define the function Q = Q(p^PjjP tPt), i.e. 

= Q(P (i)p (j ) >P (i-gj^Jp(jg-*-g) > • is continuous on A and C°° on A. 

s ° s s s 

Furthermore Q.. is positive on A. Since Q.. = d../L. 

ID ID ID ID 


r s - <i/ 8 ) £ 


Proof : This is clear from Lemma 3, (1.2) and the fact that 

V(L^.) 2 = 2 L?.^L S .. QED 

ID ID ID 

s s 

Remark : Since d.. and L.. are antisymmetric in the variables (i -j ) 

1 3 i D o S 

S S 2 

and in (i~*j~)* the quotient Q.. and the square (L..) are symmetric 
s S 13 id 

in these variables. The hypothesis of complete symmetry says that 

g 

b.. and r.. are also symmetric in these variables. 

ID ID 

Recall the definition of entropy H: A -> [0,oo) * continuous on 


A and C on A: 


= -^Tp. to p. = - to p. Pj . 


a a 

For each marginal distribution p on A^ we define H : A [0,oo) 


H a (p) 


-X* 


in p a (i ) = - \ p(i )in p a (i ) 
^ v a / a a 


=-— > p.p. 4np (1 )p (3 ), 

2 13 a a 


We define the product distribution function n: A -> A mapping A into 
A by 


tt (p) i = TT i = n P (i a ) . 
a=l 
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(1.7) 


So we can define the normalized entropy H (cf. II.(1.11)) 

i 

H(p) = H (p) - ^^H a (p) = H(p) - H(tt) 

a=l 


= - i2I p i p j 


a, oo 

H is nonpositive on A and vanishes on A by Lemma II.1.7. Note 
that the H^s depend only on the marginal distributions and so these 
functions factor through E: A -> II^A^. 


5 Theorem ; 

( 2 . 8 ) 


The following equation holds on A: 


(—R,v H) = (-R,7 H) = (1/4) y r?.b..Q £ 
P P P P / - i] i] i 

i, j ,S 


S .(L S .) 2 . 

13 


(a) The sum is nonnegative and vanishes exactly when R 
vanishes. 

a 

(b) R vanishes on the Wright manifold, A. 

(c) vanishes exactly at the point p e A where p^ = 1/n. 

_ A 0 

(d) 7 H vanishes exactly on the Wright manifold A. 


O A 

In particular, if R vanishes only on a? then H is a 
Lyapunov function for -R. 

- 2 

Proof : (2.8) follows from (2.3) by rewriting vL = 2LVL and applying 

the equation (7 H ,7 L) = L(p) (cf. I.(4.13)). Note that 7 H and 7 H 
P P P 

differ by the sum which is parallel to the transverse foliation 

_ Qt 

T because the H 's factor through E, So this sum is perpendicular 

to the VL^.'s, and (v H,7 L) = (7 H ,7 L) . 

ID P PP P P P 

S o 

Since is positive on A the sum in (2.8) is nonnegative 

S S 

and vanishes only when for all S, i and j r..b.. or L.. vanish. 

ID ID ID 
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n that case R vanishes. R vanishes on A because all of the 

g 

i. .'s do. This proves (a) and (b) . 
iD 

For (c), VpH = -E (in p^ + H)d^ and so V^H vanishes in ^ 
>nly when all the in p^' s are equal. 

(d) follows from Lemma II.1.7. QED 

The question of whether R vanishes only on a is resolved, 
.n principle, by the following: 


> Proposition : R vanishes only on A iff the coefficient vectors 

S S 

>f the set of functions [L_^: r ijj ^ 0} span the orthogonal comple- 

( 0 ) ( 0 ) 

lent B of the zero epistasis subspace =£ . This occurs iff for 

° _ g g 

ivery p e A the set of vectors [7 L..: r..b.. > 0} spans the tangent 

P ID ID iD 

space of the fibre foliation S>. 

g 

For example, if r, . > 0 for all S = S = [a |i} ji = 1,. . . , l 
ID M> 

md there exists j e I such that b. . > 0 for all i, then R vanishes 

j lj 


>nly on A- 


Proof : This result is based on the constructions for Thm. II.1.1, and 

:he special case of zero-epistasis from the previous Sec. Recall 
I b 0 b 

-hat for b e F we defined L : A -> F by L (p) = E b, to p, , A linear 

relation among the b's is equivalent to a linear relation among the 

b - b 

corresponding L 's or again among the corresponding 7 L 's at every 

P 

o 

point p of A. The equivalence of the two different conditions in 

- b 

the first paragraph follows from the fact that the map b -> v^L is an 

isomorphism of B onto the tangent space T^S) (cf. II. (1.5)). Now 

let B be the subspace of F 1 spanned by the coefficient vectors of 

[L?.: r?.,b. . > 0}. Since B ^ 
iD iD iD 


is spanned by coefficient vectors of 
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all of the L?.'s, Be: B^^. Notice that R vanishes at p iff 
ID 

S S 

L..(p) = 0 whenever r.. b.. > 0. 

ID ID ID 

~ (0) 0 S 

Suppose B = B and R vanishes at p e A. All of the ^ 1 s 

are linear combinations of those with r?. b. . > 0 because B = B^^. 

ID ID 

S * 

So L^j(p) = 0 for all i,j and S. Hence, p € A. 

On the other hand if B is a proper subspace of B let A 

be the orthogonal complement of B. is a proper subspace of A. 

o B 

In the notation of Thm. II. 1.1 R vanishes at p € A iff L (p) = 0 

B -1 e o 

and by that theorem (L ) (0) is a submanifold of A containing a 
but of higher dimension. In this case the maps E a (p) = E a^p^ for a 

in A - give integrals of motion for the recombination flow. 


additional to the individual gene frequencies—cf. Thm. 1 (d)—which 

(0) 

come from £ 

In discussing the zero-epistasis example, we saw that the 
coefficient vectors of j, for i e I, p, e L with a fixed j span 


(0) 

B . This proves the example. 


QED 


S 

There is little harm in assuming that > 0 for all i,j at 

least with S = S , u, = 1, . . . , JL. When b. . is zero rather than posi- 
|i ID 

tive we say that the ij zygote is sterile . A simple example where R 

o 

vanishes on more than A is the case of a dominant lethal gamete-type , 
T, meaning bv_. = 0 for all j € I. If is a dominant lethal gene 
then every gamete i with this allele at the a locus is a dominant 
lethal gamete-type. To see that R vanishes on more than A we will 
show is a proper subspace of A, in the notation of the above 

proof. In fact if a^ = 6^ (the Kronecker delta) then E & maps R to 
0. This is because, by complete symmetry the coefficient of d*v in 

g 

R is zero if b^ > 0. Thus, projecting the selection-plus- 
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recombination vectorfield by E eliminates R, i.e. the frequency p^ 
is affected only by selection: 


dpr 

S? = PjdDj - m). 


Now to a biologist what happens is clear. A dominant lethal is 
simply eliminated from the population by selection. This suggests 
the conjecture that when there is sufficient sterility that R van- 
ishes on more than A the population is driven out of the interior 

o 

of A and all orbits of the selection-plus-recombination field 

approach the boundary: A - A. But for a moment this is mathematically 

puzzling. The relation between sterility and selection is: b__ = 0 

implies m.. <0. This is because m.. = b.. - d.. and d.. is assumed 

ID ID iD ID ID 

> 0 for all i,j (no immortality). But the sign of is irrelevant 

to the selection field on A. Addition of any constant to all of the 

rtu j ' s doesn't affect selection on genotype frequencies. So in theory 

a dominant lethal could be selected for. The patient biologist then 

points out that the dominant lethal increases in frequency under 

selection only if everything else is being eliminated even faster. 

Indeed from (2.9 ), since = £ p.m~. < 0, one of the following must 
dpr _ 1 3 30 

be true: -^7 <0, m < 0 or pv = 0. This leads to: 

_ 5 

The Sterility Conjecture; If {VL_^ : b^ > 0) does not span the tan¬ 
gent space T£>, then from every initial position the flow of the selec¬ 
tion-plus-recombination field approaches the boundary A - A or the 
population size approaches 0 (extinction) where the population 
size lx! satisfies the equation: 


d l x l _ 


(2.10) 
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For a dominant lethal, i, this is proved by looking not at (2.9) 
but at the equation for the absolute number (cf. I. ( 1 . 1 )) : 


( 2 . 11 ) 


Since m^ < 0, Xv approaches 0. Hence, either the frequency p-j-= Xr/1 x | 
approaches 0 or |x| does. 

From now on we will simply assume that R vanishes only on A, 
and so normalized entropy, H, is a Lyapunov function for -R. 

Since we know that m increases under selection and H in¬ 
creases under recombination, it is of interest to consider the oppo¬ 
site pairing and see how m behaves under recombination and H under 
selection. 

If there is no epistasis then the gradient of m is parallel 

- — S 

to the transverse foliation J and so is perpendicular to the VL__' s 

and to -R. The extent to which there is epistasis is measured by the 

S 

functions e^^ : A -> R defined as follows: 


( 2 . 12 ) 


e S . = (7 L S .,7 -5 m) 

3-D P 3-D P 2 p 


m. - m-r - m-r + m. (i = i j~, j = j i~), 

1 1 j j SS SS 


These are linear functions, as the gamete frequencies are. In 
fact we can define the numbers: 


(2.13) 


S — S — 

e. . n = (7 L. . ,7 m, ) 
3-D.k p ID P k P 


m. .. - m— ~ m- + m. _ 
lk ik 3 k 3 k 


and then we clearly have 
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2.14) 


e ij (p) = 


p.e. • i, • 
k 3.],k 


The cumulative effect of epistasis, e: A R which measures 
he effect of recombination on fitness is given by: 


2.15) 


e = (R,V p m) p 

= (i/2) y~ 

ij jjS 

= d/ 2 ) 

i, j,S 


S , S S 
r. .b. .d. .e. . 
ID 1 D iD 


S , S S S 
r..b..Q..L..e... 
id ID ID ID ID 


So the selection-plus-recombination field acts on mean fitness 
>y the formula: 


2.16) 


(7 -5 m - R ,7 m) = V - e. 

p 2 p p A 


[ere the first term on the left is the additive variance 
- 2 - 1 - - - 

' = 2£.p.(m. - m) = (v — m,V m) (see I.(6.3) and I.(6.9)). 

A 1*1 1 p 2 p 

o 

At p e A we can consider i in p_^ as a random variable and so 
lefine the covariance of in p. with fitness: 

*i 


2 .17) Cov (in p 


,m) = p.Un p i ) (m i - m) = p i (in p i )m i 
i i 

= [ i X p i p j Un p i p j )m ij ] 


+ Hm 


Hm. 


i,D 


ftiere H is defined by (1.5) and m = E p.m. = E p.p.m... Note that 

11 ID ID 

because £n(p.p.) = to p, + to p, the covariance of j£n(p.p.) with m. . 

* 1*3 *1 *3 * 1*3 13 

[relative to P^Pj) is just twice the covariance of in p^ with m^ 
relative to p^). 
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Direct substitution using the formula for 7H yields: 

(2.18) (7 “ m, 7H) = -Cov (in p,m) . 

9 

7 Proposition : Let p be any initial position in A. Let 

iP t s b ^ 0 ) e A be the positive orbit of p under the selection-plus- 

recombination field, 7 ” m - R. 


(2.19) 


lim sup, Cov(in p,m)(p, ) > 0 

t-X» t 

lim sup L e(p L ) > 0 . 
t->oo t 


In particular, if p^ approaches a limit p^, an equilibrium for the 

vectorfield, then we can replace lim sup by lim in (2.19). If p e A 

00 

o 

then these limits are both positive (or both zero) if p X A (resp. 

• 9 

if p € A), provided R vanishes only on A. 

00 


Proof : For f = m or H, f is bounded and smooth on A and so the 

( 3 , f _ — — 

derivative along the path, — = (7 — m - R ,7 f) can't be bounded 

<*t P+. 2 P+. P*. 

df r i: 

above zero,i.e. it cannot happen that > e for all t > t^ and any 
e > 0. This means Lim inf <; 0. For f = •“ m, for example, 0 

implies: 

df 

- lim sup e lim inf — 0. 

This proves the second inequality in (2.19). The first is similar 
using positivity of (-R, 7 ^H)^. 

If p, approaches p then by continuity the limits in (2.19) 

t oo 

exist and equal Cov(An p,m)(p ) and e(p ). At an internal equilibrium 

OO 00 

7 ~ m - R vanishes and so the left sides are zero iff V and 

— • 

(-R ,7 H) vanish which they do exactly when the equilibrium is in A. 

P P 


QED 
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Remark: Here we have used the fact that 7 7 m - R is parallel to or 

p 2 

pointing inward on each face of A and so by compactness arguments 
the positive orbit p is defined for all positive t and lies in A 
if p does. 

That Cov(An p,m) should tend to be positive is intuitively 
appealing. A positive corellation between Jin p^ and m_^ means that 
the more fit genotypes are relatively more frequent and one would 
expect this effect to be intensified by selection. This argument is 
misleading. Under selection alone every orbit tends to an equili¬ 
brium at which all of the genotypes which occur have the same fitness, 

i.e. V, = 0. So fitness m. tends to become uncorrelated with any- 
A 1 

thing. To suggest that recombination is improving on selection by 
possibly allowing Cov(j£n p,m) to remain positive is probably a mis¬ 
interpretation of the results. 

That e should tend to be positive is a weak generalization 

S S 

of Felsenstein's results in [11] suggesting that e^ and d_^ tend 
eventually to have the same sign. This interpretation is correct in 
the two-locus-two-allele model, where the sum in e has essentially 
only one term. In general, e is a large sum and we can't say that 
all of the terms are positive. 


3. Recombination and Epistasis . 

In this section we examine the conditions under which the 

recombination field is tangent to the maximum entropy leaf A of 

K 

the transverse foliation T associated with K type epistasis. We 

K 

will also see why this tangency usually does not hold. 

The results are exhibited most clearly in the case where the 
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birth rates and recombination rates are genotype independent. 

g 

1 Proposition : For S c L i g I define d^: A R by: 

(3.1) = p (i) - p (ig)p (i~) . 

j 

s s s 

Assume b.. = b and r.. = r . Then R is given by: 

13 13 

(3.2) R S = r S b ^ d i^i* 

i 

If K is a complex of subsets of L* then p g A iff 

K 

c 

In p^ g as a function of i. R 1 " is tangent to the transverse 

foliation T at p iff p(i )p(i~)/p(i) g JL as a function of i. 

K S S 

Proof : (3.2) is clear from (2.1). The criterion for p g A comes 

K 

S _ 

from Thm. II.1.6. By Addendum II.1.3 R is tangent to J at p iff, 

K 

S 

as a function of i, d^/p(i) lies in Since ^ contains the con¬ 

stant functions, this is true iff p(i )p(i~)/p(i) lies in =£. QED 

S S K 

Now in p^ g means that p_^ is a product of functions in s£ 

for S g K. This will sometimes imply that p (i^) p (i~)/p (i) is a 

similar product, but only rarely that it is a sum of functions in s£ 

for S g K. For example, in the disjoint bloc case, K = T^V...V T^,, 

define S = S fl T and S = (S) = S H T for a = 1, . . . , l' . If 
a a a a a 

p g A then independence implies p(i ) = II p(i ) and so 
K S a S 

a 

(3.3) P (ig) P (ig)/p (i) = n a P(i g )p(i~ )/p(i T ) (P e A R ) - 

a a a 

Now the log of this function lies in but the function itself usuall 


does not. 
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S S 

2 Corollary: Assume b. . = b, r. . = r and K = T_ V... V T is a dis- 

- ID 13 1 V 

joint bloc model. If S 3 T or S 3 T for all but at most one of 

a a 

g 

a = then R is tangent to A at all point of A . 

K K 

Proof: In this case S = T or S = T for all a but sav a_. So in 

- a a a a ^ 0 

the product on the right of (3.3) all of the factors equal 1 except 
for the a^ factor which depends only on i^ with T = . QED 

S S 

3 Corollary : Assume b_ = b, r^ = r and K = {1,2} V {2,3} V... 

...V is the adjacent locus interaction mode. If 

g 

S = S = (a e L: a ^ p,} for some p, e L then R is tangent to k at 

{JL K 

all points of A . 

K 

Proof : By computing with (1.19) it is not hard to show that for 

S = S and p e A: 

|JL K 

(pd )p(i~))/p(i) = (pf*(i )pf* +1 (i ))/ P t|i ’ |A+1} (i i ). 

S S |JL |JL+1 Ml p, + ± 

The function on the right depends only on the pair of loci 

{p,,p. + l} e K. So it is a function in a^. QED 

There is another very special case of the disjoint bloc model 

where A tangency holds even for the more general recombination fields 
K 

of (2.1). 


4 Lemma: Let S,T <z L. If p e A_ then 

- ^ TVT 


(3.4) d S . = p(i~)p(j^)d SnT + p(i )p(j )d Sn T 
ID T T i t D t T * J T 1 -D 5 ; 


/ . V , . \ ..SHT / • \ / • \ -^SPlT 

= P(is;)p(Dq;)d i i + P(i T )p(D T )d ;U . 

iji-j ij) t j t 


sot sot 

d. . d. . 

1 j m j~ 

m - 7 m m m 
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_ _ sflT S 

where i = i j~ and j = j i~ and d. . is the analogue of d.. on A . 

S J S J J S S i t D t * 13 T 

In the last sum the first and third terms vanish if 


p e A . . or if S c T. The second and third terms vanish if 
T U VT 


p € A ^ 
TVT 
Of T. 


(0) 


( 0 ) 

or if S c T. T is the complex of singleton subsets 


Proof : By (1.20), if p 6 A TV ~, then 


d ij = P (i T )p(j T )p(i T )p(j T ) " P (i T )p(j T )P(i T )p(j T ) ' 


separate the two terms. Then add and subtract p(i~)p(j~)p(i T )p(j^) 

to get (3.4). If p e A . . then the loci of T are all independent 

TVT w 

Sf|T ~ ~ 0 

and so d. . =0. If S c T then S D T = S and S D T = 0. cr is 


Vt 

always 0. Similarly, for the complementary cases. 


QED 


Now apply Addendum II.1.3 just as in the proof of Proposition 

s s s s 

1. If r. . = r then R is tangent to a t , at p iff H.b. .d. ,/p, lies 

13 ^ K ^ 3 13 13 *i 

in as a function of i. Now suppose that p e A TV ^ and 

T T T 

b = b e R . Then we can sum on j by summing first on j~ and 

then on j to get: 


(3 


cx V" -J 1, /■ x V . ^sot. ,. n ,t , t _sdt _ 

.5) > b. . d. . = p(i~) > b. . + d. [p (i_)b. - > b. . d. . ], 

/ in 13 T / 13 i~ -^ T l / _ 13 13 

/ T J T J •• 1 t j T t T 4 "' ’ T J T T J T 


where 


T V" , T V" , T 

^ 3 i t D t Drp 1^3 


T T" 1 T 


and 


d SDT _ d SDT 

S .I — ^"T^T 

^ T 
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(following (3.1)), 


If we divide by p(i) = p(i T )p(i~) the first term then depends 
only on i T . The second is the product of an i T function and an i~ 


function, which causes the problem. However, if p e A 


; ( 0 ) 


then 


~ TVT 

SflT 

d. =0 and so the second term doesn't occur. By the same argument 

^ S 

with b.. replaced by r..b.. we prove: 

ID ID ID 


5 Proposition : Assume that and b^ are completely symmetric 

members of R , i.e. the T loci are neutral wirh respect to 

birth rates and recombination rates. IfK = TVT^ then the recom- 

5 

bination fields R are tangent to A at all points of A for all 

K K 

S c L. 


In order to apply these results, we compute the image of the 

T 

recombination fields under the projection E : A -> A T * 


6 Proposition : 


(3.6) 


ET '’ L ii> ■ ;i y T 


(s = s' n t) 


where the gradient on the right is taken with respect to the 

Shahshahani metric on A T . 

S' S' 

Assume r.. = r, . for all i and j. Then 
!D i t D t 


m o 

(3.7) E (R 


) = y~ r i .1 ( E ( b l V j T )P i P-j - E ( b l V j T )p I PT ) 9 t 

rpj ip rp J rp rp-Jiprp 


^■ip 3 3 rp 


- 0/4) ^ jT <4tt>|i T .i T )P iT P j 

— — — s 

- E (b Ii m ,j m ) Pt Pt )7L. . 

I t j T *i_j l 

rp rp rp-» rp 


lr p^ 3 ,p 
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Where E (t> | i T j, j T ) is the conditional expectation of b^ assuming i^ ar 

i are known, with distribution p.p. on I x I. i = i . _ j~ and 

T i j T S'flT S'HT 

S = S' (IT. 

T S 1 

So E (Z{R : S' fl T = S}) is given by the same sums with 


o ST S 

: i i replaced by r , 3 , =E{r. . : S' n T = S). 

T^T 1 T“^T 1 T“^T 


Proof: Since E (d . ) = d. and (i)_ = i_, i.e. (i_,j~.)_ = i_i_ 

- l i T T S S T S J T-S 

(3.6) is clear. From this (3.7) follows because: 


Z fb. o_p : k = i and jfc = j } =E(b|i , j )p. p. 
krk T T T J T 1 T T i j r 


by definition of conditional expectation. 


Remarks : (a) It follows from (3.6) and Lemma II.1.10 that the hori- 

— S 1 —S' 

zontal projection of vL_^ ^ , i.e. the projection of ^L^ perpendicular 
T —ST 

to the fibres of E , is given by v(L. . «E ) . For by Prop. 11.1.11(1 

T Vt 

E is a Riemannian submersion. 

(b) The form of (3.7)is similar to I. (7.1) or I. (7.4) but nol 

necessarily to I. (7.5) even if b. . was initially completely symmetric 

13 m I xl 

T T T 

If b depends only on i T and j^, i. e. b = b e R then 

E(b|i T ^j T ) = b^ . . Also, if the T and T loci are independent, 
1 T“ , T 

i.e. p e A Tv rp Prop. 11.1.14(a) implies that E(b|T) is completely 
symmetric if b is. But away from A Tv ~ we may lose complete symmet] 
by projecting and so have observed position effects of the projectiec 
field even if there were no position effects in the original. 


7 Corollary : Let K = T^ V...V T be a disjoint bloc complex. For 

S S 

S c L, define S = S fl T . Assume r. . = r for all S and b. . = b. 

a a 13 13 


Define R^ to be the recombination field for (as in (3.2)) but wit] 



151 


S S ,T 

r replaced by r a = 2{r : S' fl T = S }. Then 

(a) E K (R S ) = 

&' S 

(b) 2 R is tangent to A at all points of A . 

a=l a K K 

JL 1 S 

(c) At p e A , S ,R is the ( , ) orthogonal projection of 

K a—1 a p 

rS on Vk- 

K T 

Proof; E is the product of the maps E with T = T . (3.7) makes it 

Tb o T c T a c 

clear that E (R^) = 0 if a / b and E (R^) = E ( R ) • ( a ) follows. 

(b) follows from Cor. 2. (c) follows from (a) and (b) and Prop. 

11.2.11(b). QED 


Remarks : (a) This result illustrates again that tangency problems 

arise from recombination occurring in more than one bloc at once. 

(b) Note that recombination among blocs is invisible with 

K K S 

respect to E , i.e. if = T^ or 0 for all a, then E (R ) = 0 and 

g 

R |A k = 0, assuming as above that the birth rates are constant. 

As was remarked at the end of Sec. 1, it is best to regard the 

vectorfield model as part of a larger disjoint bloc model. It then 

becomes important to study the relation between the large model and 

its projection to & . Recall that we call the loci in T the 

observed loci and the remaining loci, those in T, the hidden loci. 

By the observed recombination or selection field we will mean the 

image of the recombination or selection field under the projection 
T 

E : A -> A m . 


Birth and Recombination Rates Independent of Hidden Loci? 


Hidden Loci Contribute Additively to Death Rates: This means that 


o I xl 

S T T 

r_ and b_^_. are completely symmetric members of P and d_^_. shows 
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T V type epistasis. So m^j shows T V T^^ type epistasis and 


the selection field is tangent to A 


TVT 


(or 


By Corollary 5 the recom¬ 


bination fields are also tangent to A 


TVT 


( 0 ) - 


So we can assume that 


the hidden loci are in linkage equilibrium with each other and with 

the observed loci. Restricting to this submanifold, the observed 
-IT 

selection field is V “ m by Prop. 1.7 and the observed recombination 

S ST 

fields are of the form (3.1) with r replaced by r 9 , i by i , etc., 

by Prop. 6. This is the nice case in which the genetic background 
has no observable effect. 


Recombination Rates Constant? Birth and Death Rates Show TVT 

T T T T 

Type Epistasis : This means that b=b +b,d=d + d . On A TV ~ 

-IT 

the observed selection field will be y — m by Prop. 1.7 again and 

for the observed recombination fields there will be one term of the 
S ST T 

form (3.1) with r replaced by r^' and b by b , etc. plus a term 
T 

contributed by b . The latter term will be of the form (3.2) with 
S ST 

r replaced by r 9 , i replaced by i , etc. and with b replaced by 
T 

E(b ). So on A tv ^ the effect of genetic background will appear by 
varying the strength of this added recombination term. However, A TV ~ 
is not an invariant submanifold for recombination in A and so we 
may move off it. Once we do the observed loci are no longer indepen¬ 
dent of the hidden loci, observed position effects will appear unless 
T 

b = 0. In the observed selection field new terms will appear 

T 

depending on the contributions to fitness of the hidden loci, m , and 


the distance from A m ~ as measured by the functions d. (see again 
TVT 2 l ^ 


Prop. 1.7). 
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4» Position Effects . 

g 

For simplicity we will assume that is completely symmetric 

and focus on b^, which we will assume are positive for all i, j e I. 
The recombination vectorfields are given by (c.f.I. (7.1) and I. (7.4)): 


(4.1) R S = r?.(b..p.p. - b T Tprpr)d. 

/ ID ID I D ID I D i 

i, j 

= (1/4) N r?.(b..p.p. - b T TPTpT)v L S . 

ID ID I D ID I D ID 

i, j 

with i = i and j = j i~. The recombination field is still -R with 
S S J J S S 

R = E{R S : S C L). 

The conditions of Theorem 2.1 still hold for the general recom¬ 
bination field. Furthermore, we can mimic Prop. 2.4 by defining: 

(4.2) L? 1 . t> (p) = An(b. .p.p./bT-p-rp-) 

ID ID I D id i J 

= An(b ij p i p j ) - An(bTjPvPj) 

= L S . (p) + in(b. .Att). 

id id id 

(4.3) Q^ b (p) =0^..?^.,^^). 


T S ’ b T S 

Since L.. and L.. 

ID iD 

S.b - S 

gradient. So L V L = 


differ by a constant, they have the same 
~ 7 (L^*k)^. From this follows the analogue 


of (2.3): 


(4.4) 


( 1 / 8 ) 


£ 

i, j 


r S .Q S ' b ?«. S 'V. 

iD iD ID 


But if we take the inner product with the gradient of entropy 
we run into trouble because 
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and the right side need not be positive. However, there is a special 


case where we can generalize Theorem 2.5: 


1 Theorem: The following conditions on b e R are equivalent and 

define the condition: b^ shows simple position effects . 

(a) There exists q e R 1 with q^ > 0 for all i such that 


b. ./b-rT = q.q./q-q-r 

ID ID ID ID 


for all i,j e I and S c L, where i = i j~ and j = j i~. 

b U b D 

o 

(b) There exists p e A such that the functions 

r S, b . 

[L„ : i,j e I S c L) vanish simultaneously at p. 

(c) There is a leaf of the transverse foliation T such that 

S 

R vanishes exactly on the leaf for all positive choices of the r..'s. 

ID 

(d) The vectorfields i,j e I, S c L) are coherent 

S 

in the sense that if X. . > 0 for i,j,S and 

ID 


X S . V(L S > 
ID ID 


b 2 

) =0 at p € A 


i j D j S 


then X^.L?^ = 0 for all i j,S at p. 
ID id 


If b^j shows simple position effects define, for q satisfying 


condition (a): 


(4.6) H q (p) = H (p) - E in P (p) = H (p) - J!nq 


= ~y P i Jtn p ± - P i In q ± = p. p^ 

= - iX p i p j to(p i q i p j q j ) * 
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The following equation holds on A: 


1.7) 


(-R,7„H q )^ = (1/4) r^Q^ b (Lf^ b ) 2 . 


P P 


ID 3-D 3-D 


i, j,S 


The sum is nonnegative and vanishes exactly when R vanishes 

S ,b 

id this is on the leaf of T defined by the points on which 
LI vanish, i.e. points at which the probability distribution p^ is 

o 

i A where: 


1 . 8 ) 


p q (i) = p.q./ 2 . p.q.. 

3-3. 3 


roof ; We begin by assuming (a) and prove (4.7). 
From 1.4.13 we have: 


= L?. + in q. - In q-r - jin q T + in a, 
ID ID i 1 D D 


= -(7 H, V L? . ) + (7 E^ n ^,7 L? . ) . 

P P ID P P P ID P 


- S, b - S 

Lnce 7L = VL we have 


%. 9) 


(7 H q ,v L S ’. b ) = -L S,b . 
P P 3.3 'p 13 


(4.7) now follows just as (2.8) did. Just as in Theorem 2.5, 

tie sum is positive and vanishes where R does which is when all of 
S b © 

tie ' s do. Defining for p e A the vector x by x_^ = p_^q^, we 

ee that L?^(p) = L S . (x) and since L is homogeneous of degree zero 
1 D 1 D 

e can normalize to get 


4.10) L ij b( P) = L ij(P q >- 


hus, the L?^'s vanish at p iff the L?. ' s vanish at p^ iff p^ e A. 
l j 13 
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The solutions of = 0} are the same as the solutions of 

S S — 

(L.. = -L.. (q) } which defines a leaf of J if the constants are 
ID U 

consistent i.e. if any solution exists. Solutions exist because 
p^ = q^/Ej is a solution with p q (i) = 1/n. This proves (4.8) 

and also shows that (a) implies (c). 

Also, (a) implies (d) because by (4.9) 


(-V H q 
P 


Z ,S - ,S,b.2. ,S , S,b.2 

X ijV L ij > >p = 2_ X ij ( ij > 


and the right vanishes iff = 0 for all i.j.S. 

ID ID 

(c) implies, by varying the r|j\'s, that all of the s 

vanish on the specified leaf. This implies (b). 

If (b) holds and L^^p^) = 0 for all i,j,S then (4.5) holds 
with = (p^) ^ and this implies (a). Thus, (a) - (c) are equivalent 
and (a) implies (d). 

Finally, if (d) holds then consider the vectorfield -R on A. 
This vectorfield extends to A using the original definition of (4.1). 
If = 0 the coefficient of d^ in -R is positive and if p^ = 1 the 
coefficient of is netative. So on the A-closure of any leaf of $), 
-R is a vectorfield on a convex cell pointing inward at the boundary. 

So by one of the equivalents of the Brouwer Fixed Point Theorem it 

vanishes somewhere in the open cell. Applying (d) at this point with 

S S S b S b 

X. . = r. .Q.* > 0 we get that the functions fL.* } vanish simultan- 

iD ID ID ID 


eously at some point. This is (c) , 


QED 


Remarks ; (a) We can obtain a Lyapunov function analogous to H but 

in a rather noncomputable fashion. Let denote the leaf at which 

S.b . ®a 

the L. . ' s all vanish. Since E A, : A, -> II A is a diffeomorphism we 

in 1 b b a 
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can define 


(4.11) 


H b = H q - H q «(E|A b ) 1 oE. 


Then equals 0 on and differs with by a function which fac- 

— q — ^b 

tors through E. So (4.7) holds with 7H replaced by 7H . Further- 

— A b — q 

more, 7H vanishes on just as m Lemma II. 1.7, i.e. VH is tangent 

to A-^ because at each point of A-^ the restriction of to the per¬ 
pendicular leaf of 5) takes on its maximum value by the same argument 
as Prop. II.1.6. 

The projection map (ElA-^) ^E^^ : A -> A^ is rather difficult 
to compute. One can only say that by Prop. 11.2.11(b) its tangent 
map at points of A^ is the orthogonal projection of T^A on T^A-^. 

(b) The choice of q is not uniquely defined by (4.5). But 

if r. is the ratio of two different choices for q. then r. >0, and 
i ii 

r e R 1 such that r ^ r j i- s completely symmetric. If we normalize r 

to get an element of A, i.e. divide by |r| = X^r^, Prop. 11.2.14(a) 

. # a 

implies that the resulting distribution lies in A, i.e. r,= CII r, . 

1 a 

Conversely, if we multiply a solution q_^ of (4.5) by Cr_^ for r e A 

we get another solution of (4.5). p^ and depend on the choice of 

q 

q, but for differing choices of q the corresponding H 's differ by 
j£n r 

E . This factors through E since r is a multiple of a distri¬ 
bution in A* In particular, the differences are canceled by nor- 

Ab 

malizing and H depends only on b^j, which is why we did not denote 
it H q . 

(c) Another way of describing A^ is to say it consists of all 


p e A such that i- s completely symmetric if any such p's exist. 

b_^_. shows simple position effects when such p' s exist (this is part 



158 


(c) of the definition). This suggests a function space interpreta- 

Ixl 

tion for simple position effects. Let Sym c F be the subspace 

of symmetric functions and Sym* c Sym be the subspace of completely 

ixi 

symmetric functions. Let F consisting of functions of 

L.+L 

ij which are sums of functions of i alone and of j alone. In 

the notation of Sec. II.2 this is the set of functions whose carriers 

are contained in L + L = L x {0} V [0] x L. Sym D defines 

Sym which is isomorphic to F 1 as each element is of the form 
L+L 

u. + u. for some u e F 1 . Now in b ; defined by (in b) . . = in b. ., is 
13 13 ID 

always a member of Sym. If it lies in Sym* there are no position 

effects. If it lies in Sym* + Sym the position effects are 

L+L 

simple. 

I don't know or any way of detecting whether or not position 

effects are simple. The obvious approach is to use the above function 

space interpretation. If b^ is not completely symmetric, i.e. 

in b e Sym*, one tries to project to Sym* and check whether the error 

lies in Sym . In the notation of Thm. II.2.9, we define 
L+L 

„ IXI ^ . , 

S: F -> Sym* by 

(4.12) S (n) = 2~ i y~ t T q (n) : Q c L). 

IXI 

With respect to the standard Euclidean metric ( , ) on F , S is the 

orthogonal projection because the T^'s are isometries, see Prop. II. 

2.14. I conjectured in vain that b^ has simple position effects iff 

in b - S(in b) € Sym . While clearly sufficient this condition is 
L+L 

not necessary because S does not preserve Sym . 

L+L 

I am also unable to concoct any convincing biological reason 
why position effects, if they occur, are likely to be simple. For 
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example, I see no reason that the observed position effects of the 
previous sections need be simple. The simplicity is exclusively in 
the fact that we can analyze the model. Even if nonsimple position 
effects occur, we can apply the Brouwer Theorem to -R on each leaf of 
the fibre foliation. There exist points of each leaf on which R 
vanishes, because each leaf is an open convex call and -R is directed 
inwards at the boundary. However, the failure of condition (c) means 

S 

that the equilibria for R depend on the recombination rates, r^j, 
as well as on b^^. In the case of simple position effects the set of 

equilibria of R, namely A , depends only on b. If there are no 

position effects then the set of equilibria of R is A and so is 
independent of b, too. 

The portrait painted by Theorem 1 is structurally stable. This 

means that if the nonsimple position effects are small enough, i.e. 

in b is close enough to the subspace Sym* + Sym , then there is a 

L+L 

«->-»(» — 
smooth manifold of equilibria. A, for R C close to Ar where in b 

b 

is the projection of in b on Sym* + Syrn^^. This follows from a 
parametrized version of the structural stability theorem for linearly 
stable equilibria. Apply the theorem to -R on each leaf of the 
foliation 5). Furthermore, A is globally attracting for -R on each 


leaf, again assuming in b is close enough to Sym* + Sym. 


L+L 


One case where these results always apply is to the two-locus- 
two-allele model (n^ = n = i = 2). There is not too much room in 
such a model so the position effect, if it occurs, is simple. 


2 Corollary : Consider a two-locus-two-allele model with i = 2, 

1^ = [A,a} and I = {B,b}. We number the gametes: 1 = AB, 2 = Ab, 

3 = aB, 4 = ab. The only nonzero functions L are: 
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(4.13) 




= An P 1 P 4 /P 2 P 3 - 


Assume all of the birth-rates are positive and let p denote the 
ratio of the "coupling" to "repulsion" birth rates, i.e. p = b^/b^. 
If p = 1 there are no position effects. If p / 1 then b shows only- 
simple position effects. 

The gene frequencies p^ = p^ + p 2 and p^ = p^ + p^ are invar¬ 
iant under the recombination field and on each cell defined by a 
choice of p^ and p^ the recombination field approaches the point sa¬ 
tisfying = -in p. If p = 1, this is the distribution with 

P 1 = P A P B> 6tC * 


Proof : Let = p, q 2 = q^ = q^ = 1. (4.5) is satisfied by 

inspection. QED 


5. Mutation. 


The mutation field is defined by (cf. I. (8.1) and I. (8.2)) 
N = > p . N . ,5. 

Z-n 1 1 




(5.1) 


r n. . i ^ j 

V ■) 31 

t-n,* 1 = 3 


where n.^ = £ n.. summed on all j / i. 

1 * 13 

Notice that the coefficient of is positive if p^ = 0 and 
that the sum of the coefficients is 0. So N is a vectorfield on 
A pointing inward on the boundary. On the other hand, the formula 
(5.1) with Pj replaced by x^ extends the definition of N to a 
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linear vectorfield on all of R 1 which we will also denote by N. 

Define (R 1 )^ = [x e R 1 : E x^ = 0}. Recall that this subspace is the 

o o 

tangent space at p of the manifold A, T^A. 

1 Theorem: Assume n.. >0 whenever i ^ i. On A the mutation field 
- ji ' J 

o 

has a unique equilibrium point, q, which lies in A. The change of 

variable x = p - q translates N to its restriction to (R 1 )^. It is 

the linear differential equation associated with the matrix N which 

maps (R 1 ) to itself. Let p = max.n . The set of eigenvalues of 
U 1 1* 

J J 

N| (R ) q , i.e. the spectrum of N| (R ) q , lies in the disc: 

Spec (N| (R 1 ) Q ) c [z e E: | z+p | £ p} - (0), 

where (E is the complex plane. Since all of the eigenvalues have 
negative real parts, 0 is a globally, asymptotically stable equili¬ 
brium for N| (R 1 ) and q is a globally, asymptotically stable equili¬ 
brium for N on A. If we define the rate constant p(N) by 

-p(N) = max(Real part z: z e Spec(N|R 1 ) )} 

then p(N) measures the rate of approach to equilibrium. 


(5.2) 


0 < p (N) <; 2p = 2 max.n . 

l l 7 ^ 


Proof : Regarding N as a vectorfield on R or as a matrix, an 
equilibrium q is a vector satisfying qN = 0, i.e. = 0 

all i. Also if 1 denotes the vector all of whose entries are 


then Nl = 0, i.e. E.N.. = 0 for all j. 

1 JJl 

Now let X be greater than the positive number p. The 
matrix N + XI is a positive matrix and so the machinery of the 


for 

1, 


Frobenius theory of positive matrices applies. For a nice exposition 
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of this theory see the Appendix of Karlin's book [18]. If P is a 
positive matrix, i.e. a matrix with all entries positive,, then there 
are vectors r and i and a positive number a satisfying: 

(1) r and j i are positive vectors and are right and left 
eigenvectors for P with eigenvalue a, i.e.: 

Pr = ar and j£P = a l. 


(2) The multiples of i are the only left eigenvectors of P 
associated with the eigenvalue a and there are no other nonnegative 
left eigenvectors associated with any eigenvalue. Similarly for r 
and the right eigenvectors. 

(3) Let [r] 1 consist of all vectors x such that (x,r) = 0 

where ( 9 ) is the usual Euclidean inner product. x e fr] 1 implies 

xP e [r] 1 and the spectrum of the restriction P|[r ] 1 is the spectrum 
of P with a removed. This spectrum is contained in the open 
disc of radius a: 


Spec Plfr] 1 c (z e E: |z| < a}. 


Now we apply all this to P = N + XI. , c ince N1 = 0 PI = XI and 
so by (2) a = X and we can let r = 1. By (1) there is a positive 
left eigenvector i and we can normalize to get q_^ = So by 

(2) q is the unique left eigenvector of N + XI in A and has 


eigenvalue X. So qN = 0. Since q^ > 0 for all i, q e A. 

The translation result is clear and since (R 1 )^ = [l] 1 , (3) 

implies that the spectrum of (N + XI) | (R^)q i- s contained in the open 
disc of radius X. Subtracting XI just translates the spectrum and 


so we have 
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Spec N|(R I ) 0 c (z e (C: |z + x| < X}. 

The intersection of these open discs as X approaches p from above 
is (z e (C: j z 4- p | p) — {0}. The estimates on p (N) are then clear 
and the interpretation in terms of stability comes from the standard 
theory of linear differential equations with constant coefficients, 
see e.g. [15]. QED 


Remark : For the Frobenius theory to apply to a matrix P it is 

necessary that all the entries be nonnegative but they needn't all be 

positive. It suffices that for some power of P all entries be 

positive. This means that we need not assume that n.. >0 for all 

]i 

i ^ j. It is sufficient to assume: (1) n_._^ 0 for all i / j and 

(2) for every ordered pair (j,i) with i ^ j we can get from j to 
i by a sequence of mutations, i.e. there is a sequence from j to 
i of distinct elements of I with n^ > 0 for k, X successive 
members of the sequence. See again the Appendix of [18] or for a 
nice graph theoretic treatment of this problem, Demetrius [7], 


2 Lemma : Let A = (a^) be a square matrix of corank = 1 meaning that 
0 is an eigenvalue for A and the associated left and right eigen- 
spaces are one-dimensional. Thus, there are nonzero vectors r and 
X, unique up to constant multiple, satisfying 


Ar = 0 and XA = 0. 

Let M be the associated cofactor matrix for A, i.e. = (-1) 1+ ^ 

times the ji minor of A. Then there exists a nonzero constant K 


such that 
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M. 


ID 


Kr. i . . 
1 D 


Proof : By Cramer's rule AM = MA = det(A)I and this equals 0 because 

A is singular. But because the corank of A is 1„ some 

(n-1) x (n-1) minor is nonzero and so M is not the zero matrix. 

Since AM = 0 each column of M is a multiple of r. So M.. = r.K. 

ID i D 

for some constants K. not all zero. Since MA = 0 each row of M is 
D 

a multiple of JL. Now if r / 0 then we can write the i^ row of M 

as Kr. i . and so get K. = Ki .. QED 

ig D D D 

3 Corollary: For the mutation field matrix N let M^ be the ii 
minor. Then 


M. ./ v 

11 n 


M. . . 
DD 


Proof : We apply Lemma 2 with A = N. In the proof of Thm. 1, we 

showed that the eigenspaces of N + XI associated with X is one-dimen¬ 
sional. So the eigenspaces of N associated with 0 are one-dimen¬ 
sional and we can choose i = q and r = 1. Lemma 2 then says that 
M^j = Kq^ for some nonzero constant K. In particular,, M^ = Kq^ 

and 2. M.. = K. QED 

D DD 

There is an important special case where the equilibrium is 
obvious and where there is a simple Lyapunov function for the muta¬ 
tion field. This is when the forward and backward mutation rates are 
the same. 


4 Theorem: Suppose n.. = n.. for all i.j distinct in I. The center 
- ID Di 

of the simplex q is the equilibrium of the mtation field. The 


function f: A R: 
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(5.3) 


f (P) 



q i } 


2 


is a Lyapunov function for N on A. 


Proof; In this case the matrix N is symmetric. The differential 
equation for mutation extended to F* is 


dx 

dt 


= xN. 


With ( * ) the usual inner product on F , we have 


(5.4) 


d “ (XjX) 

dt 


= (xN,x). 


On the invariant subspace (F ) Q the eigenvalues of N have negative 
real parts by Thm. 1 and so by symmetry are negative. This implies 
that the quadratic function (xN,x) is negative definite on (F^q, i.e 


(5.5) 


(xN,x) <0 if x e (F ) and x / 0. 


Since Nl = 0 symmetry implies IN = 0 and so the equilibrium q 
is the vector 1 normalized to lie in A, i.e. q_^ = 1/n where n is 
the number of elements in I. Since qN = 0 we have for p in A: 


(5.6) 


a 2 (P- q >P- q ) 
dt 


= (pN,p - q) = ((p - q)N, p - q) . 


By (5.5) this is negative unless p = q. This proves (5.3) since 

•^(p-q.p-q) = -\ f(p). qed 

Remark ; In this case the mutation field is the gradient of the 
quadratic function ~((p-q)N,p-q) but it is the gradient with respect 
to the usual inner product not with respect to the Shahshahani metric 
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For the multilocus model with I = n I » the mutation field for 

c& ct 

the ct locus on A is: 

a 


-a V" a -a 

5T = > p, N. . d. 

/ 1311 

<— J a J a a a 


1*3- 

CL CL 


-a 
N. . 

3 1 
J a a 


n -j dL i 

j i J a ^ a 
a a 


ct 

-n. 3 =1 

1 * a a 

a 


where n. = Z n. . , summed on all 3 4 1 . 

1 * 13 ct a 

a ora 

On A the partial mutation field corresponding to mutation 


at the a locus is: 


p.N a .d. 
Z— 3Dii 


ct -a 

N. . = N. . 6 . . 

31 31 3 ~i~ 
J j cl a J a a 


a -a 

The Kronecker delta notation means that N.. = N. . if 1 = 3 at all 

31 3 1 

J J CL CL 

CL 

loci other than a and N.. =0 otherwise. 

31 

Finally, the full mutation field is the sum: 


N = ^ N = \ p . N .. a. 

a i, j 

V a 

N. . = ) N. . . 

31 / 31 


-a . a a 

Extending N to a linear vectorfield on R , and N , N to 


linear vectorfields on R , we can identify these linear vectorfields 


with the corresponding linear maps and with the matrices operating on 
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the right, e.g. N: p 1 -> P* by N(x)^ = 2^ x^N.^. 

5 Theorem: For a,p e L = [1 the following diagram commutes 

(meaning the two composed maps are equal): 


P 


I 


N 


/K 

a 


P 


I 




* 


I 

P 


P 



ap 


— , 

so that if a = p the vertical map on the right is N and if a f P 
the vertical map is 0. 

The following diagram commutes: 


P 

* 


* n a R 


-a 

n a N 


-E-> n a R 


Thus, the vectorfields N and n^N are E related meaning that E 

-CL 

maps the vector N at p to the vector n^N at E(p). 

ct 

Assume that n. . >0 whenever i / j , for all a. Let 

i i a J a 

J a a 

a • -a 

q € ke the equilibrium for the vectorfield N . The unique 

globally asymptotically stable equilibrium point, q, of the mutation 

field N on A satisfies: 


(5.10) 


„ a 
q i = n a q i 


and so q g ^ c A. 


-a. 


Defining the rate constants p(N) and p(N ) as in Thm. 1, we 
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(5.11) 


-a a 

0 < n(N) < min p(N ) < 2 min max. n. 

a ^ a i i * 

a a 


Proof; 


(5.12) 


N a (x) 


. = 'S' x.N?. ='\ — x. . N a . 

1 4- 3 4_ vs Vc 


i.e. the only nonzero terms in the first sum are those where j 

a 

agrees with i at the loci other than a. Now applying E amounts 
to summing over the i~ indices. So we have 


E a (n“ (x) ) 


Z a „ -a 

E (x). N. . 
J “J i 
J a J a i 


= N a (E a (x) ) ± . 
a a 


On the other hand, applying E p with p / a includes summing 

a cl 

over the i indices. The row sums of N are all zero, i.e. N. = 0 
a 1 * 

6 -a a 

for all j . So E P (N (x) ) = 0 if (3 ^ a. Thus the first diagram 

commutes, and so for each a does the following 


This implies commutativity of the second. 

Now if we apply (5.12) with x = q defined by (5.10) we get 


N (q) . = 


. = <y g a 5“ . )n q e 
1 Z. 3 a Va B ^ a 1 


So q is an equilibrium for each N and so for N. But by Thm. 1 
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and the Remark thereafter N has a unique equilibrium and it is 
globally, asymptotically stable. So q of (5.10) is it. 

Finally, the rate constant estimate (5.11) follows from: 


_ I 

U a Spec(N a |(F a ) 0 ) c Spec(N|(R ) Q ). 


To prove this let x e (F a )Q left eigenvector for N a with eigenvalue 

z, i.e. 2. x. N a . = zx, . Define x e (R 1 )^ by x. = x. n 7 q^ and 
111 1 1 0 J i i p^a i Q 

J a J a J aa a a ' p 


apply (5.12): 


Z x.N.. = (\ x. N. . )I1 Q / q. = zx. 
D D 1 / 3 3 1 P^a^i Q l 

4--— a a a p 


x.N^. = x. . )n n , q? 

4- ^ Dx x a Z_ i 


. =0 (H ^ a). 

P 


So x is a left eigenvector for N with eigenvalue z. 


QED 


Actually we can say much more than just that q lies in the 
Wright manifold. 

a 

6 Addendum : The mutation fields N and N are all tangent to the 

e o e 

Wright manifold A at all points of A. So A is an invariant mani¬ 
fold for the associated flows. 

a 

Proof : This can be proved by direct computation. N is tangent to 

A iff N?/p^ as a function of i lies in for p € A (cf. 

Addendum 11.1.3(2)). If p e A this is E, p. N a . /p. which depends 

■^a ■'a a 1 a 

only on the a locus. QED 

Remark : It is crucial for this result not only that mutation occur 

independently at separate loci, but also that the mutation rates at 
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the a locus depend only on the alleles at that locus. It is no 
longer true when genes at one locus influence the mutation rate at 
others. 

Addendum 6 could also be proved using the following result 
which is related to Shahshahani's Prop. 3.3 [28]: 


7 Proposition : (a) The vectorfields N* and on R 1 commute, i.e. 

the Lie bracket [N^, N^] = 0. 

(b) Let S c L and assume that the birth rates b.. and recom- 

13 

g 

bination rates r^ are genotype independent, i.e. b„ = b and 

S S a 

r.. = r for all i,j e I. In that case, the mutation fields N and 
13 J 

S as 

the recombination fields R commute, i.e. [N ,R ] = 0 for all a, S. 
So the total mutation field N and the total recombination field 


R = 2^ R commute. 


Proof : (a): Commuting of linear vectorfields is the same as the 

commuting of the associated matrices. Since any matrix commutes with 

a • B B- a 

itself, we can assume a ^ and show that NN = N N . It is easy 
to check that: 


a 3, -a -B 

(N 1ST) . . = N. . N. . 6 . 

13 1 a :, a (a,£) 


This is clearly symmetric in a and f3. 
(b): Define 


a V ' a ^ * -a \ -a 

N. = > p.N.. = > p.N . 6. . = > p. . N. . . 

1 / j 31 / j 3 1 3~i~ _ 3 i~ 3 1 

J J J J a a J a a .— J a a J a a 


So that n“ = E. N?S.. Since n“ 

ill 3 * 

J a 


0 
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(5.13) 


Now if a g S 



a 

N. . 

D- 1 ' 


S S 


0 


let T = S - (a). 


if a g S (i~ g I~ fixed). 


Then it is also clear that 


(5.14) 




-a 

p. . N. . 
Jill 

a T J a a 



-a 

p. N. . 6. . 

i i -i i 
J S a a J T T 


a S 

Now to show [N ,R ] = 0, we must show that 

2. d S 5.N a = 2. N a 9.d S for all i, on L (cf. (3.2)). Since R S 
3331 3331 


we can assume a g S. 


R 


S 


Z N?d . d S = \ N a (6..-p. 6. . -p. 6. . 

D D i Z_ D !D Dg i s i g 3 - 1 * 

j j 

a \ a V a 

= N. - p. > N. . - p. ) N. . 

1 Z__ 1 q ZL Do 1 ' 


■ < - p i s X 


S J S S J S S 

3 S 


N a (by (5.13)). 

1 s- l s 


On the other hand, 


Z d S d .N a = (P- - P- P- )N a - 

D D 1 D *D g Dg D 1 

j j 

a -a 

= N. - > p.p.N. . 6. . 

i / ~\ -i~ -i i -i~i~ 

*— J S J S J a a J a a 


Now 6. . = 6. . 6. . and so summing first on j~ and then applying 

“ , a 1 a D T :L T S 

(5.14) we have 
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Z S a a NT -a 

d.d.N. = N. - y p. p. N. . 6. . 

nni 1 / -] i i -ii 

j j ^- j s s J a a J T T 


a ^ -a a 

= N. - p. > N. . = > N.d .d. . 

1 L s 4— Vs 4— 3 3 1 


QED 
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IV. The Hopf Bifurcation 


The Hessian. 


Let X=EX.d. = L p.J.d. he a vectorfield on A. X is a 
11 *1 1 1 

function from A to F 1 . We define the Hessian of X at p, 

HX: TAxTA-^Fto be the bilinear form defined by: 

P P P 


( 1 . 1 ) 


H X(Y 1 ,Y 2 ) = (dX(Y 1 ),Y 2 ) . 

P P P 


So to get H X we take the derivative of X at p in the direc- 
P 

2 

tion and then take the inner product with Y using the Shahshahani 
metric at p. If we extend the function X_^ and to get vector- 
fields X=EX.d. = E x.C.d. on P. we can compute: 

l l li i ^ 


( 1 . 2 ) 


H X (Y 
x 


\y 2 > = y x 7 xp Y y 

j J 

i ill / dXj 3 3 


Just as with the corresponding formula for the derivative I. (3.2.4) 

these formulae taken at x = p are independent of the choice of extend 

1 2 

ing functions provided that the vectors Y and Y lie in the tangent 

space T p A = (F I ) Q = (Y € F 1 : E Y ± = 0#. 

Taking the Hessian at p is itself a linear operation. For 
1 2 

X and X vectorfields on A and t e F: 


(1.3) 


H (tX 1 4- X 2 ) = t(H X 1 ) + (H X 2 ). 
P P P 


In order to study the stability properties of an equilibrium 
point p of a vectorfield X on A, one looks at the derivative 
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d^X: T^A and computes the eigenvalues. The Hessian is important 

here because: 


1 Lemma: With fY : r=l.n- 1} an ( , ) orthonormal basis for 

p 

• I II 

T A = (F ) „, the matrix a of the linear map d X: (P ) ^ -> (F )* is 
p 0 rs * p 0 0 


given by: 


a rs = H p X(Y S ,Y r ) (r,s = 1,...,n - 1). 


In particular, the eigenvalues of the linear map d^X are the same 
as the eigenvalues of the bilinear form H^X. 

Proof: The s column a (r = 1,...,n - 1) consists of the coordi- 

- rs 

nates of d^X(Y S ) with respect to the Y-basis. Since the basis is 
r 

orthonormal, the Y coordinate is obtained by taking the inner product 


with Y . Hence, (1.4). 


Remark : The eigenvalues of a linear map are independent of the choice 

of basis. For a bilinear form the independence is only over the 
choice of orthonormal basis. 


Any bilinear form can be decomposed into its symmetric and 
anti-symmetric (or alternating) parts. So we define; 


syw = I (H p (Y 1* Y 2 ) +H p (Y 2’ Y 1»' 


AH (Y ,Y ) =-(H (Y ,Y ) - H (Y ,Y )). 
p 1 2 2plz p21 


SH is symmetric, AH is alternating and their sum is H , i.e. 
P P P 
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yw = yw- 

(l- 6 ) AH p (Y l’ Y 2 ) = - AH p (Y 2' Y l ) - 

H = SH + AH . 

P P P 

This decomposition gives a test for gradient vectorfields. 

o 

2 Theorem : A vectorfield on A is a gradient field with respect to 

the Shahshahani metric iff the Hessian is symmetric at all points 

of A. In detail, H X = SH X for all p in A or equivalently 
P P 

O • 

AH X = 0 for all p in A iff there exists f: A -> F such that 
P 

X(p) = ^p f for p in A 

Proof; The proof is a direct computation. But before diving into 
it we will describe what is really going on from the tensor analysis 
point of view. 

The vectorfield X is dual with respect to the Shahshahani 

metric to a differential form u> on the tangent space. In terms of 

Thm. 1.3.1, u) = X(p)*. By definition X is the gradient Vf iff 
P 

(ju is the differential df. So X is a gradient iff u> is an exact 
form. Now the covariant derivative of m is a bilinear form and 
its alternating part is the exterior derivative duj (cf. [25, Thm. 
5.7]). So the covariant derivative of w is symmetric iff uj is a 
closed form. Because A is simply connected closed = exact for 
linear differential forms. One can actually compute this covariant 
derivative by using the change of coordinates in Thm. 1.4.1. It 
is not quite the same as the Hessian, essentially because the constant 
fields are not autoparallel with respect to the Shahshahani metric 

o 

on P. However, the two bilinear forms differ only in the symmetric 



176 


part for any vectorfield X. Putting this all together we get that 
the alternating part of the Hessian is everywhere zero iff the vector- 
field is a gradient. 

o 

Starting again, suppose X is the gradient of f: A -> R. 

Extend f to a function on P = {x e R 1 : x_^ > 0 for all i}. For 
notational convenience define 


(1.7) 


df 

dx 


Z 


df 




By I. (4.12) if X = vf, then for all i: 


( 1 . 8 ) 


?! 


of _ M 
dx_^ dx 


Taking the partial with respect to x^ we have, using (1.7): 


(1.9) 




dx 


a = -a . £ . _ y* -a f _ -M. 

dx.dx. / k fcx,dx. dx. 
3 1 D ,— k j j 


12 I 1 

Now substitute in (1.2) and note that for Y,Y € (R) Q EY^ = 0 
means that the last two terms on the right in (1.9) make no contri¬ 
bution to H (X). So we have: 

P 


( 1 . 10 ) 


>( 7f)(Y\Y 2 ) = 

1 

+ Zi^r y j y i (y1 - y2 e ,bI, o> 




By symmetry of the mixed partial derivatives formula (1.10) shows that 

12 

H (7f) is symmetric in Y and Y . 

P 

o 

For the converse, suppose that X is a vectorfield on A 
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with H^X symmetric at every poin p of A. So 

(l.ll) h^x( y 1 ,y 2 ) = h^x(y 2 ,y 1 ) 

0 12 I 

for x = p e A and Y ,Y € (F ) Q . 

In applying (1.2) we can use any extension of X to P. We 

use the trick introduced in the proof of Prop. III.1.1. Choose the 

0 

extension so that each function P F is homogeneous of degree -1 

i.e. f^(x) = |x| ^§^(x/|x|) with |x| = £ x^. Then by Euler's theorem 
on homogeneous functions we have for each i: 


V - 

(1.12) > x. — i = -§.. 

Z_ 3 a x -; j- 

j 

Also we know that £ p^§^ = E X^ = 0 at every point p of A. 

o 

By homogeneity E x^§^ = 0 for all x in P. Taking the partial 
derivative with respect to x^ we get: 

(1 - 13 > Z_ x i^r = -V 

i ^ 

° I 

I now claim that (1.11) holds for all x € P and all Y = F . 

1 2 

From (1.2) H x X(Y ,Y ) is homogeneous of degree -2 in x and so for 

1 2 I o 

Y ,Y € (F ) q (1.11) holds for all x in P because it holds for 

o I 

p = x/|x| in A. Now since every vector in F can be written in the 
form Y + tx with Y in (F 1 )^ the extension of (1.11) to all of F 1 
follows from: 


(1.14) 


H x X(x,Y 2 ) = 0 = H (Y 1 ^) 


y 1 ,y 2 e p 1 . 


This follows from direct substitution in (1.2) using (1.12) when 

1 2 
Y = x and (1.13) when Y = x. 

1 2 

Thus, the symmetry condition (1.11) holds for all Y and Y in 
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R . Since the first sum in the ?-version of (1.2) is always symme¬ 
tric. This symmetry implies that the matrix (d^/dx^) is symmetric. 


(1.15) 


d?i ^ 

dx. ~ 8x. 
D i 


(x g P, i, j g I ) , 


These are the classic integrability conditions for the dif¬ 
ferential form £ §^dx^. By the Poincare Lemma [8, Thm. V.8.1] (1.15) 

o 

implies that there exists a function f: P -> R such that 


(1.16) 


i dx^ 


(x g P, i g I ) . 


Since £ x.£. = 0, — = 0 and so when we restrict f to a function 
ii dx 

o _ 

on A, (1.8) implies that X is the gradient 7 f. QEE 


Now we compute the Hessian of our biological vectorfields. 
Selection is easy since it is a gradient. Apply (1.10) : 


(1.17) H (7(^m))(Y 1 , 

P ^ 


Y 2 ) = p. 1 (m. - m)Y'Sf 2 + Y~ m. 

/ i i ii / ID i D 


12 I 

(Y L ,Y Z G (R ) 0 ). 


It is important to see how different the two terms on the right are. 

Fix p g A for the moment and following I. (6.6) write 

- 12 
m. . = m + (m. - m) + (m. - m) + 0. .. Then since £ Y. = £ Y. =0, 
iD i D iD ID 

(1.17) becomes: 


(1.18) 15^(7(~ m)) (y\y 2 ) = 


Z p.^(m. - mjY^Y 2 + ^ 6. .yS 

i i ii / 13 l 


(y\y 2 e (P I ) 0 ). 
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So the first term depends on the additive part and the second term 
depends on the dominance part of the selection matrix m^^. 

3 Proposition : Suppose that the recombination and birth rates are 

g 

completely symmetric so that the recombination field R is given 
by III.2.1. Then in tensor notation the Hessian 


H R = (1/4) 
P 


.d (d S .) & d L S . = 
' P 13 P ID 


V"* r S .b. .c 
Z— 1 3 iD 
ij 

(1/8) \ r S .b. . (p.p. + p- 7 p-r)d L? . <g> d L? . 
/ ID ID I D I D P ID P ID 


(1.19) 


i,D 


S S 

(1/8) \ r..b..d..[d (An p.p.)®d An p.p.-dp (An p-rp-rj^d An PtPt] 

y ID 1 ] ID P 13 p 1 D 13 p 13 


> 2 r 

i, j 


(1/4) \ r f . b. . d S . [ d (An p-rp-r ) Ad (An p. p .) ] 
/ ID ID ID P I D P I D 


i,D 


The first two terms are symmetric and so equal SH^R . The 


third term is alternating and so equals AH^R*" 


Proof ; We compute the Hessian directly from the definition (1.1). 

S S S — S 

4R = 2 r..b..d..VL... For convenience we will drop the constant 
ID ID 1 D ID 

g 

factor r..b.. which occurs in all of the sums. It is completely 
ID id 

symmetric and so is not affected by the symmetry i,j i,j. So 
S S — S 

we assume 4R = 2 d..VL... 

ID ID 

_ g 

Recall from I. (7.3) that the gradient VL. . is a constant linear 

_ g 

combination of the constant fields d.. So its derivative d (VL..) = 0. 

1 P ID 


d (4R S ) 
P 


-2 


d (d S .)v L S .. 
P 13 P 13 


Consequently: 
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H (4R ) = 
P 


= y~ d (d S . 

z.p ^ 


:.) ® d lt., 

p i] 


where we are using the duality between the gradient of and its 

differential. 

S \ g 

H (4R ) = ) [p.p.d (An p.p.) - pTpTd (An PtPt)] $ d L. .. 

P / P 13 * 1*3 p * 1*3 p 13 


Subtracting and adding the term p^p^d(An PtPt) in the brackets we 

S 

break up H (4R ) into two sums. The first 2_ = 

P 1 

2 j p i p j d p (inp i p j ) " p i p j d p Unp i p j )] ® d P L ij 

t ~ g g 

= > p.p.d L.. ® d L.. 

Z_ 1 3 P ID P ID 

V g g 

Z. _ 1 D P ID P ID 


The last equation holds because the interchange ij -> ij changes the 

g 

sign of both d factors. Averaging the last two sums we see that 

2^ is 4 times the first term in Prop. 3. 

The second sum 2 2 = 


[p.p.d (An PtPt) - PTPTd (An PtPt)] ® d L S . 

/ 3- 3 P 13 I D P I D P ID 

= ^ d^d p (£n PjPj) ® [d p (An p^J-d Un PtPt)]. 

2^ in tern breaks in two. The second sum 2^ n = 

2 21 

- \ d?.d (A PtPt) ® d (An PtPt) 

Z— ID P n *1*3 p *1*3 

= y d?.d (Anp.p.) <g>d (Anp.p.). 

ID P *1*3 P I D 
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— S 

The latter equation is because ij -> ij changes the sign of cL ^. 
Averaging these two we get the second term in Prop. 3. 

Finally, £ 22 = 



d S .d 
1 D P 


d s .d 
3-: p 


(in PtPt) 
(In P^j) 


<g> d 

P 


<8> d 

P 


(m p.p_.) 

(An ptPt ). 


Averaging these two sums we get the final term in Prop. 3 by defini¬ 
tion of the wedge product of two forms [24, Sec. 1.9]: 


( 1 . 20 ) 


WV - 


(U 1 (Y 2 > <B 2 (Y x )] 


QED 


Remark : The latter two sums in (1.19) vanish on the Wright manifold 

° S o 

A since all of the d. .'s are zero there. So if p e A the Hessian 

S s o 

HR is symmetric and H (-R) = -£„ H (R ) annihilates T A. Further- 
P P S p p 

a 

more, if R vanishes only on h, cf. Prop. III.2.6, then H (-R) 

P 

is clearly negative definite on the normal subspace T^S to T^A. 

Finally, for the mutation field N = £ P^N^d^, (l- 2 ) implies: 


( 1 . 21 ) 


1 2 \ ' -I 12 

? N(Y ,Y 2 ) =^p. ^.Y.Y.. 




This is never symmetric everywhere corresponding to the fact that the 
mutation field is frequently a gradient field with respect to the 
usual metric (cf. Thm. III.5.4 and the Remark following) but never 
is a gradient with respect to the Shahshahani metric. 


2. The Wright Conjecture . 


At least for the selection plus recombination field, the 



182 


Wright Conjecture is essentially true in the zero epistasis cases. 
We consider these first. 


1 Proposition : Consider the two locus, two allele model described in 

Cor. III.4.2. If the selection field 7 (“ m) has zero epistasis then 
- 1 - 

the combined field 7 (~ m) - R admits a Lyapunov function on A. 


Proof : In this case the sum in III.(4.4) has essentially only one 

term, i.e. by III. (4.4) and III. (4.13), R is a positive constant 
times 2 • Since 7 (~ m) and R are orthogonal in the zero 

epistasis case, it follows that for F = m - (L^^) ^ 


<yi" ,) - r ^p f) p = v a + rQ M b 'iy L i; b)2 i'p 


for some positive constant 
and L^k(p) vanish. 


This is positive unless both 7 (“ m) 

P 2 

QED 


For larger models we can prove a local result: 

g 

2 Proposition: Suppose: (1) the recombination numbers r^ and birth 

rates b^ are completely symmetric and that the recombination field 
vanishes only on the Wright manifold (cf. Prop. III.2.6), and (2) 
fitness is completely symmetric and with zero epistasis so that 

e ££ 

equation III.(1.12) holds and on each m has a (necessarily 

ct 

unique) non-degenerate critical point q . 

Then the point q(i) = ^q^i^) i- s t ^ ie unique equilibrium for 
the combined field vC~ m) - R. For every e > 0 sufficiently small, 
m + sH is a Lyapunov function for the combined field on some neigh¬ 
borhood of q. 

Proof : Since there is no epistasis, v (*“ m) and R are orthogonal. 
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o 

So the combined field vanishes at p iff R(p) = 0—and so p e A—and 

7^ (*“ m) = 0 and so p is a critical point of m. Since m is the 

a - a 

sum of the m 's, p is a critical point on m iff E (p) is a critical 

a ° 

point of m for all a. So as q is the unique point in A with 

a a • 

E (q) = q for all a, q is the unique A equilibrium of the combined 

field. 

To prove the Lyapunov function result it suffices to show that 
the function 


(2.1) f(p) = (7 m) - R,v (m + eH)) 

p 2 p p 

has a nondegenerate local minimum at p = q, with f(q) = 0. Since 
7 (m) is orthogonal to R, f is the sum of three functions (see III. 
Secs. 1,2 ): 

- 1 - - - — - 2 
f- L (p) = (7 p (~ m) , y p (m) = 2 \ p i (m i - m) 

i 

(2.2) ,f 2 (p) - «<-R,7 p H) p » 

ijs 

ef (p) = e (V (” m),7 H) . 

3 p 2 p p 


At p = q: 7 m. R and 7 H all vanish so it is clear that 
P P 

f (q) = 0 and d^f = 0, i.e. q is a critical point. To complete the 

proof it is enough to show that for e > 0 small enough, the Hessian 

of the gradient of f is positive definite on T A = (R 1 )^. We will 

q 0 

sketch the argument omitting the computational details. 

Apply (l.lO)at p = q noting that there rrn = m for all i and 


so the first term in (1.10) vanishes and the second one simplifies 


to become: 



184 


(2.3) H^Y 1 ,* 2 ) = H g (7 fjMY^Y 2 ) 


2 ^ q i (m ij Y j )(m ik Y ^- 

ijk 


Now III. (1.12) and the nondegeneracy assumption implies that the 

annihilator of m. . is exactly = Kernel of E: F* -> n F (see 

13 J a a 

(Cof. 11.1.5(d)). Equivalently, the Kernel of the linear map 

(F 1 ) _ -> F 1 defined by Y -* (£ . m. .Y.) in B ^ . This means that the 
0 * D i] ] 

annihilator of H.. is T 5) and H n is positive definite on the ortho- 
1 q 1 


gonal complement T^A = T 

Applying the definition of the Hessian directly, one can 
show that 


(2.4) 


H 2 (Y 1 ,Y 2 ) = H q (7 f 2 )(Y 1 ,Y 2 ) 


r S .b. .Q S . (7 L S .,Y ) (7L S .,Y 

/ . id id id q ID 1 P q ID 


2 } p* 


ijS 


H 0 


the 7 L S . 

q ID 


clearly annihilates T A and since 

c 

s with r. .b. . > 0 span T S). So H_ 

id id q 2 


R vanishes only on Aq 
is positive definite on 


T ©. 

q 


s H 2 (7 f^) is somewhat messy to compute. But because of 
the factor 7 m in the definition it is not hard to show that 
H 2 (Y 1 ,Y 2 ) = 0 if both Y 1 and Y 2 € T^3D. So in particular, H 2 + 
is the same as H on T S) and so is positive definite there. 

2 q 

o _ 

So H, is positive definite on T A and annihilates T 5). On 

i q q 

the annihilator T^S) H 2 + is positive definite. It then follows 
from a linear algebra argument that for e > 0 sufficiently small 

o 

+ e(H 2 + H^) is positive definite on all of T^A. The argument is 
essentially one used by Smale in an economics context [30]. The 
precise lemma is stated and proved in [2 Thm. 2.3]. So for e > 0 
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sufficiently small H (7 f) is positive definite and so f has a non- 

q 

degenerate local minimum at q. QED 


Remark : There is a spurious argument to show that m + eH works on 

° o 

A - G where G is any neighborhood of the boundary A - A. It goes 


as follows. f^ is nonnegative on A vanishing only the fibre 

_ ~ 1 ot 0 • 

» q = E ({q } ). f^ is nonnegative on A vanishing only on A. 

— a 

vanishes on 5) U A but is otherwise not determined. Recall the 


q 

o • d 

dif feomorphism E x L: A -> II A x R . Now fix G and choose a 
^ a a 0 

small neighborhood of 0 in R^. The subset A: L ^(R^ - V ) - G is 

a compact subset of A disjoint from A and so on it f is positive 

and bounded away from 0. Since 7 (~ m) vanishes on E "*"({q a )) there 

au »ct 

exists a small neighborhood of (q J in n A suth that on 

E _ 1 (U 0 ) n (L _ 1 (R d - V Q ) - G) = (E X L) - 1 (U 0 X (R d - V Q )) - G: 

±2 > | ^ 3 1 • So on this set f = f^ + e + f 3 ) > 0 for all e > 0. Now 

-1 o o 

E ( (II A ) - U^) - G is a compact subset of A disjoint from 3) and 
a a O' ^ J q 

so on if f^ is bounded below and |f^| is bounded above. So for e 

small enough f^ > e|f^| on this set. So on this set, too f > 0. In 

short for an arbitrarily small neighborhood (E x L) ^(U^ x V of 

and 0 < e < e q depending on U Q x V Q and G, f > 0 on 
• -1 

A - (E x L) (u o x V Q ) - G. By the above Prop, we can choose e-^ > 0 
so that with 0 < e < e-^ f is positive on some punctured neighborhood 
of q. The problem is with the order of choices. Fix e < and get 
Uq* Vq so that f is positive on (E x L) ^(Uq X V ) - {q}. The 
Sq needed to get f positive on the complement of (E x L) ^(Uq x V^) 
might be smaller than the e that was fixed to get x V^. The best 
we can do with all this is to say that e can be chosen small enough 
so that the open set fp: f(p) < 0) consists of pieces either very 



186 


close to A - A or to q and furthermore that its closure is disjoint 

o — 

from A U 2)^. A sharper argument can probably show that the piece 
near q is in fact empty and so m + eH works away from the boundary 
of A. 


We now turn to the main theorem of the chapter. 


3 Theorem ; Let X be a smooth vectorfield on A which is not a 
gradient field with respect to the Shashahani metric. There exists a 
smooth one-parameter family of symmetric matrices m^ (a in some 
neighborhood of 0 in F) such that at X = 0 a Hopf bifurcation 
occurs in the family of vectorfields 7 (~ m^) + X. In detail, there 
exists a point q of a such that 

( a ) V (“ m^) + X(q) = 0 for all X. So q is an equilibrium 

g A 

point for every vectorfieId in the family. 

(b) With respect to ( , ) the Hessian H (7(“ m^') + X) has 

q q 2 

eigenvalues with negative real parts for X < 0 and as X crosses 0 
exactly one pair of complex conjugate eigenvalues (with nonzero 
imaginary part) cross the imaginary axis, if p(X) is the real part 
of this eigenvalue pair then 


dX 


> 0 


at X = 0. 


Proof ; Since X is not a gradient, Thm. 1.2 implies that there 


exists a point q £ A such that AH^(X) / 0. Choose such a point and 

fix it. q will be the equilibrium point. 

We now construct m^ in pieces following I.(6.6) by choosing 

m^ at q, nu - m^ at q and then 0^.. 

_ 1 ID 

m at q is arbitrary, choose it to be 0. 
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rtu - m^ = at q is determined by the condition that 
_ 11 

7 (~ m^) + X(q) =0. So define 
q 2 


(2.5) 


k. = itk - = itu = -X. (q)/q. 

ii l l i 


So this part will not depend on X. Define 


( 2 . 6 ) 


a , 1 2 

H (Y 


2 T- -112 12 I 

y^) = y k i Y i Y i on* € (f 1 ) 0 ), 


By (1.18) the symmetric part of the Hessian at q of 
V (“ m^) + X consists of SH^(X) + H a plus another term depending only 
on the choice of 0^. Now I claim that for any symmetric bilinear 
form on (F*)q there exists a unique choice of 0^ such that 


(2.7) 


X , 1 2 

H (Y ,Y ) 


y e'.vM 

il 


12 T 

(Y ,Y e (F ) 0 ). 


The condition that says that 0V^ is the pure dominance term 


at q is 


( 2 . 8 ) 


eV(q) = 


q.e..=0 (iel). 

1 ^ 


(2.8) says that, regarded as a symmetric bilinear form on F , 
the matrix 0^ annihilates q. Since F 1 = (F 1 ^ ®{tq: t e F) , we can 
uniquely extend the (F^q form to a symmetric bilinear form on F* 


by defining: 

(2.9) 


H X (q,Y) = H X (Y,q) =0 (Y 6 R 1 ). 


So given any on (F 1 )^ extend it to F 1 by (2.9) and bilinear- 

ith. The associated symmetric matrix 0 A . of the extended form 

il 

satisfies (2.8) from (2.9) and (2.7) by definition of the matrix of a 


bilinear form. 
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We are left with choosing a one-parameter family H of symme¬ 


tric bilinear forms on (R )^. When they are chosen then the Hessian 
at q of the combined field becomes: 

(2.10) H X + H 3 + SH (X) + AH (X). 

<3 q 


So the alternating part AH^(X) is fixed and the symmetric 
part H X 4- H 3 4- SH^(X) is arbitrary. So we can choose it to add 
constant negative real parts to all of the eigenvalues of AH^(X) 
except for one imaginary pair and there let the real part that is 
added on be X. 

In detail, define the linear map L: (R 1 )^ (R'Sq 


( 2 . 11 ) 


(My 1 )^ 2 ) = ah (y 1 , y 2 ). 

Si 


With respect to the inner product ( , ) , L is a skew-symme¬ 
tric operator. So we can choose an ( , )^ orthonormal basis 

1 n-1 

Y , . . . ,Y such that for real numbers t^,. . . t^, 0 < k (n-l)/2 : 


1 2 

L (Y ) = t^ 

L(Y 2 ) = 

3 4 

L(Y J ) = t 2 Y^ 

L(Y 4 ) = -t 2 Y 


( 2 . 12 ) 


MY 2 *" 1 ) = V 2k 


, 2k, ,,2k-l 

L (Y ) = -t^Y 


L (Y ) = 0 


2k < i <; n-1. 


Now define S 1 : (^ I )q (F^q ky 


(2.13) 


S^Y 1 ) = XY 1 , S X (Y 2 ) = XY 2 


S X (Y 1 ) = -Y 1 2 < i ^ n-1. 
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The associated symmetric form is defined by: 


(2.14) 


>12 >1 2 
hJ(y\y^) = (S A (Y ± ),Y Z ) q . 


Finally, define 


(2.15) 


H X = H* - H 3 - SH (X) . 
0 q 


This defines 9^. and hence rru . with Hessian at q equal to 
lO 13 


(cf. (2.10)) 


(2.16) 


H 0 + 


QED 


We now apply the bifurcation theorem of Hopf [22, Thm. 3.15] 

and get 

© 

4 Corollary : Let X be a smooth vectorfield on A which is not 
a gradient field with respect to the Shahshahani metric (e.g. the 
recombination field R or the mutation field N) then there exists 


a symmetric matrix rru ^ of fitness constants such that the combined 
field 7 (” m) + X has a nontrivial (i.e. nonequilibrium) periodic 
orbit. 


The theorem and its corollary leave open some important 
questions: 

What is the role of position effects? That is, can the m..'s 

ID 

be chosen completely symmetric? For the proof we needed the power 
to construct any symmetric form and this required us to range over 
all symmetric matrices m^j. 

What about the stability of the cycles? Can attracting, i.e. 
limit, cycles occur? In any case are they structurally stable? 
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We can't answer questions like these by the pure thought argu¬ 
ments of the above proof. The construction and detailed analysis of 
examples is needed. For example, if examples can be constructed 
satisfying the "vague attractor" condition of [21] then both of the 
latter questions would have affirmative answers. The place to go for 
examples is the simplest case. These cycles can occur in the two 
allele model—by the above corollary. Investigation of this case 
is in progress. 



Appendix 


1. Proper Mappings . 

Lemma II.1.2 is a special case of the following: 

Theorem ; Let f: X -> Y be a topologically proper local homeomorphism 
of locally 1-connected Hausdorff spaces (1-connected means connected 
and simply connected). f is a finite covering space. In particular, 
if X is connected and Y is 1-connected then f is a homeomorphism. 
This in turn follows from: 


Lemma : Let f: X -> Y be a topologically proper local homeomorphism 
of Hausdorff spaces. f has the unique homotopy lifting property: 
Let Z be a compact, locally connected Hausdorff space and assume 
that we have a commutative diagram of continuous maps: 


Z x 0 



Z x [0,1] 


* 


G 


* 


X 


f 


* 

Y 


G lifts uniquely to a continuous map G: Z x [0,1] -> X with G^ the 
given map and foG = G. 


Proof : (1) Because f is a local homeomorphism of Hausdorff spaces, 

path lifts, when they exist, are unique. 

(2) For y e Y, f ^y is compact because f is proper and it 
is discrete because f is a local homeomorphism. So f ^y is finite. 

(3) Define 0 ^ t* ^ 1 to be the supremum of the set 
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{s: G|Z x [0,s] lifts continuously to G with G^ the given map). 

(4) We show that the supremum is achieved. This is clear if 
t* = 0. If t* > 0 then by (1) G can be defined uniquely to lift 

G on Z x [0,t*) where the latter factor is open on the right. Now 
f~ 1 G(Z X [0,1]) is compact and so for each z e Z the limit set 

S z = n o<t<t* closure tS(z x [t,t*))] 

is the decreasing intersection of nonempty compact connected sets and 
so is nonempty, compact and connected (Kelley, [19], p. 163). Fur¬ 
thermore, if x e S z then there is a sequence t^ converging up to t* 
such that x = Lim G(z,t ) (Kelley, [19], p. 72). So f(x) 

= Lim G(z,t ) = G(z,t*). Thus, S z is a nonempty connected subset of 
the finite set f ^G(z,t*). So S z consists of a single point which we 
will denote G(z,t*). We must show that this extension is continuous. 
Let U and V be neighborhoods of G(z,t*) and G(z,t*) respectively 
such that f: U -> V is a homeomorphism. G is a neighborhood of 
(z,t*) and so by Wallace's Lemma (Kelley, [19, p. 142]) there exists 
e > 0 and W a connected neighborhood of z such that 
W x [t* - e, min (t* + e , 1] ) c G 1 V. Now G | W x [ t* - e, t*) and 
(f|u) 1 «G|W x [t* - e , t*) agree on z < [t* - e,t*) and so by 
connectedness they are equal. The latter map extends continuously 
to W x [t* - e,min(t* + e,l)] and so gives a continuous extension 
of the former. 

(5) We have just shown not only that G extends to 

Z x [0,t*] but also that it extends to an open subset of Z x [0,1] 
containing Z x [0,t*], namely to the union of Z x [0,t*] and the union 
of the family [W x [t* - e,min(t* + e,l))} indexed by z e Z. If 
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t* < 1, Wallace's Lemma again and compactness of Z imply that the 
open set contains some Z x [0,t**] with t* < t** <; 1. This contra¬ 
dicts the definition of t*. So t* = 1. QED 

Remark : Note that if Z^ <z z and we began with a lifting defined on 

Z x 0 U Zq x [0,1] our lift G agrees with the given lift on 
Z Q x [0,1] by uniqueness. 

Proof of Theorem : (See Spanier, [31, p. 78].) Let V be a 1- 

connected open neighborhood of y^ e Y^. Let (x^,...,x n ) = f ^y^ and 
let U_^ be the arc-component of x^ in f ^V. We show that H U^. = 0 
if i ^ j and f: -> V is bijective for all i. 

(1) Any point y^ of V can be connected by a path y in V, 

to y^. There are n lifts of this path beginning at x^,...,x 

respectively. Thus, every point of f connects in f to a point 
of f ^y^. So f = U.U. and f U . = V for each i. 

1 0 ill 

(2) If a point of f connects in f to both x_^ and x_. 

(i.e. U. H U. 0 0) , then x. and x. can be connected by a path in 

2. j i j 

f ^V. This path projects to a loop in V which is homotopic in V 

to the constant path at y. By the Lemma this homotopy lifts to a 

homotopy of the original path to a path connecting x^ and x in 
f ^y which is discrete. So i = j. Contrapositively, U_^ ft Uj = 0 

if i ^ j. 

(3) The argument of (2) also shows that f|u^ is injective for 
each i, because no two points in the same fibre can be connected 

by a path in f ^V. 


QED 
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2. Partially Defined Distributions . 

The conjecture remarked upon after Thm. 11,2.7 can be written 

as: 


Property I(K): For a complex K, every compatible family of 
distributions on the subproducts 1^ corresponding to the blocs S in 
K is induced by some distribution on I i.e. 

(2.1) E K (A) = V R n nu s : S e K). 

A related conjecture is: 

Property H(K): For a complex K, if a family of interior 

distributions {p : S e K) (e n{A s : S e K)) is induced by some dis- 

K S 

tribution p on I, i.e. E (p) = {p }, then the distribution of 

K S 

maximum entropy tt with E (rr) = {p ) is an interior distribution, 
i.e. tt e A 

For our examples, let i = 3 and 1^ = {0,1) for a = 1,2,3. The 

3 

points of I are the vertices of the unit cube in H . Let K con¬ 
sist of all subsets of L = {1,2,3} except for L itself. So we 
are given compatible families of pairwise distributions and are 
looking for a distribution on the product. 

For our first example, let p^ put weight 1/6 on each of the 
vertices except for the diagonal pair (0,0,0) and (1,1,1) and weight 
zero on these. p^ e A - A but projects to a family of interior 
distributions. In fact, the projection to each face of the cube puts 

on weights: 1/3, 1/3, 1/6 and 1/6. To show that p Q is the member of 
A with maximum entropy among those with the specified projections. 



195 


we will show that it is the only such member of A. 

In this case the Kernel of E is one dimensional consisting 
of all multiples of the vector x satisfying 

(2.2) x(i 1J i 2 ,i 3 ) = (-1)° where a = i 1 + i 2 + i 3< 

So for any vector in the kernel, as one steps from one vertex to 
another along an edge the value of x just changes sign. In parti- 
cular, x(0,0,0) and x(l,1,1) have opposite signs. Hence p^ + tx e A 
iff t = 0. 

For our second example, we use the leverage indicated by the 

first. For e > 0, let x take the value (1 + e)/6 on the six points 

e 

other than (0,0,0) and (1,1,1) and let x be -s/2 on each of these 

e 

elements. For s small—in fact, for s < 1/2—x projects under 

e 

E K to an element of n(A c ,)- But for all t x + tx is negative on 

S e 

either (0,0,0) or (1,1,1). So no member of A maps to 

e k (x s ) e v R n n (a s ). 

To be specific, with e = 1/4, consider the distribution on 
(0,1} X (0,1}: 

p(0,0) = p(l,l) = 1/12 

(2.3) 

P (0,1) = p(l,0) = 5/12. 

Putting this distribution on each of the pairwise subproducts 

3 

of I = (0,1} yields a compatible family of distributions which is 
not induced by any distribution on I. 

Proposition : For any complex K and any family of distributions 

S K 

(p : S e K} in the image E (A)* let rr be the distribution of maximum 
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S K S © 

entropy mapping to [p } under E . [p } e A - A , i.e. the entire 

K K 

K _ 1 s ° ° 

preimage (E |A) {p } <= A - A iff tt € A - A. 

K K S 

Proof ; If E (Pq) = E (p^) = [p }. Then the whole segment 

K S ° 

p = Pq + t(p^ - Pq) maps under E to (p }. Suppose Pq € A - A and 

p^ e A. I claim that for t > 0 and small enough HCp^) > H(Pq) where 

H is the entropy. So Pq cannot be tt. Thus, if there is any in- 

K — 1 S 

terior distribution in the fibre (E |a) [p } then tt is interior. 

The function H(p t ) is strictly increasing near t = 0 because 
as a function of t it is differentiable for t > 0 and the deri¬ 
vative approaches too as t approaches 0. This is because the deri¬ 
vative of -t in t approaches too as t approaches 0. QED 

Actually the above result can be proved directly from Thm. 

11.1.6. 

Corollary: For any complex K, the property H(K) implies I(K). 

Proof : If (2.1) is false, i.e. E (A) = A is a proper subset of 

K 

V fl n[A } then there must exist [p^] e V fl n[A } - A . Let 
K S U K o K 

S ° K ® o 

fp ) € A = E (A). The segment between them lies in V fl n[A ) and 
IK K S 

S K K ° 

must meet some point (p } of E (A) - E (A) . By the theorem, the 

r S. 

maximum entropy distribution tt mapping to IP^i must lie in A - A. 
This contradicts H(K). Taking the contrapositive, we get the 


corollary. 


QED 
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3. Game Dynamics . 

In a recent, elegant paper [33] Taylor and Jonker give a 
dynamic interpretation of the concept due to Maynard Smith and Price 
of an evolutionarily stable strategy in a biological game. Their 
dynamic model turns out to be identical to the vectorfield model of 
frequency dependent selection. Using the concept of the Hessian from 
Chap. IV we get a more conceptual proof of their main result. 

In this case I is the set of n strategies and p^ is the 
proportion of the population using strategy i. F(i|p) denotes the 
payoff to a player using strategy i when the strategy distribution 
vector of the population is p. Taylor and Jonker examine the 
differential equation: 

dp i 

(3.1) — = P i (F(i|p) - F(p|p)) . 

Here F(p|p) = 2 p^F(i|p). So defining f^(p) = F(i|p) - F(p|p) we 
see that this equation comes from the vectorfield on A: 


X(p) 


X P i ? i (P)S i- 


Now define: 


(3.2) 


_ a,F(i|p) 

d . . — _ 

13 dPj 


where this really means extend the functions F(i|p) to functions 
F(i|x) on F 1 and let a^ = SF(i|x)/dXj. 

An equilibrium p e A, i.e. F(i|p) = F(p|p) for all i, is 
called an evolutionarily stable equilibrium or ESS if 
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(3.3) ^a ij Y i Y j <0 (0 Y e (R I ) Q ). 

Since ?^(p) = 0 for all i, it is easy to check from IV. (1.2) , 
that the Hessian of X at p is given by: 

1 2 12 12 I 

(3.4) H (Y ,Y ) = ) a Y Y (Y ,Y e (R ) n ). 

P / i] 3 i 0 

So p is an ESS iff H (Y,y) < 0 for Y ^ 0 in (R 1 ) . 

p u 

We now apply the following classical theorem of Lyapunov. It 
is a matrix theory result with a simple differential equation proof. 

Lemma : Let V be a Euclidean vector space with inner product ( , ). 

Let A: V -> V be a linear map with associated bilinear form: 

(3.5) h(y 1 ,y 2 ) = (ay 1 ,y 2 ). 

12 1 12 2 1 

With the symmetrized form SH(Y ,Y ) = — (H(Y ,Y ) + H(Y ,Y )) is 
associated the self-adjoint linear map A g : V -> V by 

(3.6) SH(Y 1 ,Y 2 ) = (A s Y L ,Y 2 ). 

A^ has only negative real eigenvalues iff SH is negative 
definite, i.e. SH(Y,Y) < 0 if Y / 0, and in that case the eigenvalues 
of A have negative real parts. 

Proof : Recall that with respect to an orthonormal basis the matrix 

of the bilinear form and the associated linear map are the same (see 
Lemma IV.1.1). Since a self-adjoint operator can be diagonalized 
with respect to some orthonormal basis, it easily follows that SH 
is negative definite iff A^ has only negative eigenvalues. If this 
is true then define the differential equation on V: 
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dY 

dt 


AY. 


Let F(Y) = - ~(Y,Y). Then 


dF 

dt 


(AY,Y) = -H(Y,Y) = -SH(Y,Y) > 0 


(Y £ 0) . 


So F(Y) is a Lyapunov function for Y and every solution of the 
differential equation approaches 0 asymptotically. This implies 

that the eigenvalues of A have negative real parts. For a real 

. at 

eigenvalue a gives a solution of the form e Yq. A complex con¬ 
jugate pair a + ib gives solutions of the form 

e at [cos bt Y^ + sin bt Y^]. These only approach 0 as t goes to 
oo if a < 0. QED 


Theorem: Let p be an ESS in A. Then p is an asymptotically 

stable equilibrium. 


Proof: Since p is an ESS the svmmetrized Hessian SH (X) is neqative 

P 

definite. So by the above lemma and Lemma IV.1.1 the linearization 

of the vectorfield d X: T A T A has eigenvalues with negative real 
P P P 

parts. So p is asymptotically stable. QED 


Jonker and Taylor also prove the above theorem when the 

equilibrium lies on the boundary A - A, with an adjustment of the 

definition of an ESS. As we have not dealt with problems at the 

boundary earlier we won't pursue in that direction. 

The payoff function F(i|p) is a generalization of the payoff 

matrix a_^ = payoff i receives playing against j. In that case 

F(i|p) = £ p.a.. = a. . For the linear game the vectorfield is 
1 i lj ip 
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X = E p,(a. - a ). This is just like our selection field except 

1 ip pp 

that a need not be symmetric. If a^ is symmetric a game theorist 

would call the game completely cooperative. Each player benefits 

as much as his "opponent" and so the population cooperatively evolves 

to the point of maximum mean payoff a . This is because in the 

PP 

symmetric case the vectorfield is the gradient 7 (~ a ). In fact,, 

2 . PP 

this case is just a notational change from the selection field. 

I like thinking of this as saying that sex is a cooperative game. 

This suggests the general notion of a cooperative game is one 
where the vectorfield associated to the payoff function F(i|p) is 
a gradient with respect to the Shahshahani metric. Thm. IV.1.2 


provides a test for when this is true namely when the Hessian is 


everywhere symmetric. 

At the other extreme if a.. is anti-symmetric i.e. a.. = -a.. 

iD ID ]i 

then the game theorist calls the game zero sum because the joint pay¬ 
off in an i vs j game is a^ + a_.^ = 0. Since every matrix can 
be written as the sum of a symmetric and an anti-symmetric matrix, 


every linear game equation is the sum of a gradient and a zero-sum 
game field. 

It would be interesting to see if the dynamical systems 


associated to zero-sum games have special properties. 
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